diff --git a/.gitattributes b/.gitattributes index 7f14244348..865da2ca2d 100644 --- a/.gitattributes +++ b/.gitattributes @@ -12,6 +12,11 @@ *.jpg binary *.gif binary *.ico binary +*.jpeg binary +*.mp3 binary +*.zip binary +*.bin binary + # Preserve original line endings for specific document files *.doc text eol=crlf diff --git a/.github/ISSUE_TEMPLATE/show_me_the_bug.md b/.github/ISSUE_TEMPLATE/show_me_the_bug.md index 504a2bd124..0c33f0319f 100644 --- a/.github/ISSUE_TEMPLATE/show_me_the_bug.md +++ b/.github/ISSUE_TEMPLATE/show_me_the_bug.md @@ -19,6 +19,7 @@ assignees: '' - LLM type and model name: - System version: - Python version: +- MetaGPT version or branch: diff --git a/.github/workflows/build-package.yaml b/.github/workflows/build-package.yaml index 7f4fee53e3..294a13f71a 100644 --- a/.github/workflows/build-package.yaml +++ b/.github/workflows/build-package.yaml @@ -1,8 +1,9 @@ name: Build and upload python package on: + workflow_dispatch: release: - types: [created] + types: [created, published] jobs: deploy: diff --git a/.gitignore b/.gitignore index 6bc67fa61e..aa5edd74a9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,7 @@ ### Python template # Byte-compiled / optimized / DLL files -__pycache__/ +__pycache__ *.py[cod] *$py.class @@ -27,6 +27,8 @@ share/python-wheels/ .installed.cfg *.egg MANIFEST +metagpt/tools/schemas/ +examples/data/search_kb/*.json # PyInstaller # Usually these files are written by a python scripts from a template @@ -151,9 +153,14 @@ allure-results .vscode key.yaml -data +/data/ data.ms examples/nb/ +examples/default__vector_store.json +examples/docstore.json +examples/graph_store.json +examples/image__vector_store.json +examples/index_store.json .chroma *~$* workspace/* @@ -168,6 +175,7 @@ output tmp.png .dependencies.json tests/metagpt/utils/file_repo_git +tests/data/rsp_cache_new.json *.tmp *.png htmlcov @@ -178,4 +186,5 @@ cov.xml *.faiss *-structure.csv *-structure.json -metagpt/tools/schemas \ No newline at end of file +*.dot +.python-version diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000000..292433f800 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,3 @@ +recursive-include metagpt/ext/stanford_town/prompts *.txt +recursive-include metagpt/ext/stanford_town/static_dirs *.csv +recursive-include metagpt/ext/stanford_town/static_dirs *.json \ No newline at end of file diff --git a/README.md b/README.md index 72ce034125..8f5cc53935 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,9 @@

## News -🚀 March. 01, 2024: Our Data Interpreter paper is on arxiv. Find all design and benchmark details [here](https://arxiv.org/abs/2402.18679)! +🚀 Mar. 29, 2024: [v0.8.0](https://github.com/geekan/MetaGPT/releases/tag/v0.8.0) released. Now you can use Data Interpreter via pypi package import. Meanwhile, we integrated RAG module and supported multiple new LLMs. + +🚀 Mar. 14, 2024: Our **Data Interpreter** paper is on [arxiv](https://arxiv.org/abs/2402.18679). Check the [example](https://docs.deepwisdom.ai/main/en/DataInterpreter/) and [code](https://github.com/geekan/MetaGPT/tree/main/examples/di)! 🚀 Feb. 08, 2024: [v0.7.0](https://github.com/geekan/MetaGPT/releases/tag/v0.7.0) released, supporting assigning different LLMs to different Roles. We also introduced [Data Interpreter](https://github.com/geekan/MetaGPT/blob/main/examples/di/README.md), a powerful agent capable of solving a wide range of real-world problems. @@ -55,40 +57,49 @@

Software Company Multi-Agent Schematic (Gradually Implementing)

-## Install +## Get Started -### Pip installation +### Installation > Ensure that Python 3.9+ is installed on your system. You can check this by using: `python --version`. > You can use conda like this: `conda create -n metagpt python=3.9 && conda activate metagpt` ```bash -pip install metagpt -# https://docs.deepwisdom.ai/main/en/guide/get_started/configuration.html -metagpt --init-config # it will create ~/.metagpt/config2.yaml, just modify it to your needs +pip install --upgrade metagpt +# or `pip install --upgrade git+https://github.com/geekan/MetaGPT.git` +# or `git clone https://github.com/geekan/MetaGPT && cd MetaGPT && pip install --upgrade -e .` ``` +For detailed installation guidance, please refer to [cli_install](https://docs.deepwisdom.ai/main/en/guide/get_started/installation.html#install-stable-version) + or [docker_install](https://docs.deepwisdom.ai/main/en/guide/get_started/installation.html#install-with-docker) + ### Configuration +You can init the config of MetaGPT by running the following command, or manually create `~/.metagpt/config2.yaml` file: +```bash +# Check https://docs.deepwisdom.ai/main/en/guide/get_started/configuration.html for more details +metagpt --init-config # it will create ~/.metagpt/config2.yaml, just modify it to your needs +``` + You can configure `~/.metagpt/config2.yaml` according to the [example](https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml) and [doc](https://docs.deepwisdom.ai/main/en/guide/get_started/configuration.html): ```yaml llm: api_type: "openai" # or azure / ollama / open_llm etc. Check LLMType for more options - model: "gpt-4-turbo-preview" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview + model: "gpt-4-turbo" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview base_url: "https://api.openai.com/v1" # or forward url / other llm url api_key: "YOUR_API_KEY" ``` ### Usage -After installation, you can use it as CLI +After installation, you can use MetaGPT at CLI ```bash metagpt "Create a 2048 game" # this will create a repo in ./workspace ``` -or you can use it as library +or use it as library ```python from metagpt.software_company import generate_repo, ProjectRepo @@ -96,47 +107,19 @@ repo: ProjectRepo = generate_repo("Create a 2048 game") # or ProjectRepo("⏬ Step 1: Download metagpt image and prepare config2.yaml :: click to expand :: -
- -```bash -docker pull metagpt/metagpt:latest -mkdir -p /opt/metagpt/{config,workspace} -docker run --rm metagpt/metagpt:latest cat /app/metagpt/config/config2.yaml > /opt/metagpt/config/config2.yaml -vim /opt/metagpt/config/config2.yaml # Change the config -``` - -
- +You can also use its [Data Interpreter](https://github.com/geekan/MetaGPT/tree/main/examples/di) -
⏬ Step 2: Run metagpt container :: click to expand :: -
- -```bash -docker run --name metagpt -d \ - --privileged \ - -v /opt/metagpt/config/config2.yaml:/app/metagpt/config/config2.yaml \ - -v /opt/metagpt/workspace:/app/metagpt/workspace \ - metagpt/metagpt:latest -``` - -
-
+```python +import asyncio +from metagpt.roles.di.data_interpreter import DataInterpreter -
⏬ Step 3: Use metagpt :: click to expand :: -
+async def main(): + di = DataInterpreter() + await di.run("Run data analysis on sklearn Iris dataset, include a plot") -```bash -docker exec -it metagpt /bin/bash -$ metagpt "Create a 2048 game" # this will create a repo in ./workspace +asyncio.run(main()) # or await main() in a jupyter notebook setting ``` -
-
### QuickStart & Demo Video - Try it on [MetaGPT Huggingface Space](https://huggingface.co/spaces/deepwisdom/MetaGPT) @@ -156,6 +139,7 @@ https://github.com/geekan/MetaGPT/assets/34952977/34345016-5d13-489d-b9f9-b82ace - 🧑‍💻 Contribution - [Develop Roadmap](docs/ROADMAP.md) - 🔖 Use Cases + - [Data Interpreter](https://docs.deepwisdom.ai/main/en/guide/use_cases/agent/interpreter/intro.html) - [Debate](https://docs.deepwisdom.ai/main/en/guide/use_cases/multi_agent/debate.html) - [Researcher](https://docs.deepwisdom.ai/main/en/guide/use_cases/agent/researcher.html) - [Recepit Assistant](https://docs.deepwisdom.ai/main/en/guide/use_cases/agent/receipt_assistant.html) @@ -163,10 +147,13 @@ https://github.com/geekan/MetaGPT/assets/34952977/34345016-5d13-489d-b9f9-b82ace ## Support -### Discard Join US -📢 Join Our [Discord Channel](https://discord.gg/ZRHeExS6xv)! +### Discord Join US + +📢 Join Our [Discord Channel](https://discord.gg/ZRHeExS6xv)! Looking forward to seeing you there! 🎉 -Looking forward to seeing you there! 🎉 +### Contributor form + +📝 [Fill out the form](https://airtable.com/appInfdG0eJ9J4NNL/pagK3Fh1sGclBvVkV/form) to become a contributor. We are looking forward to your participation! ### Contact Information @@ -179,7 +166,9 @@ We will respond to all questions within 2-3 business days. ## Citation -For now, cite the [arXiv paper](https://arxiv.org/abs/2308.00352): +To stay updated with the latest research and development, follow [@MetaGPT_](https://twitter.com/MetaGPT_) on Twitter. + +To cite [MetaGPT](https://arxiv.org/abs/2308.00352) or [Data Interpreter](https://arxiv.org/abs/2402.18679) in publications, please use the following BibTeX entries. ```bibtex @misc{hong2023metagpt, @@ -190,4 +179,14 @@ For now, cite the [arXiv paper](https://arxiv.org/abs/2308.00352): archivePrefix={arXiv}, primaryClass={cs.AI} } +@misc{hong2024data, + title={Data Interpreter: An LLM Agent For Data Science}, + author={Sirui Hong and Yizhang Lin and Bang Liu and Bangbang Liu and Binhao Wu and Danyang Li and Jiaqi Chen and Jiayi Zhang and Jinlin Wang and Li Zhang and Lingyao Zhang and Min Yang and Mingchen Zhuge and Taicheng Guo and Tuo Zhou and Wei Tao and Wenyi Wang and Xiangru Tang and Xiangtao Lu and Xiawu Zheng and Xinbing Liang and Yaying Fei and Yuheng Cheng and Zongze Xu and Chenglin Wu}, + year={2024}, + eprint={2402.18679}, + archivePrefix={arXiv}, + primaryClass={cs.AI} +} + ``` + diff --git a/SECURITY.md b/SECURITY.md index 623e6e5562..924ce50157 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -4,9 +4,9 @@ | Version | Supported | |---------|--------------------| - | 7.x | :x: | - | 6.x | :x: | -| < 6.x | :x: | + | 0.7.x | :x: | + | 0.6.x | :x: | +| < 0.6.x | :x: | ## Reporting a Vulnerability diff --git a/config/config2.example.yaml b/config/config2.example.yaml index 3a5cc35855..7cfd70347e 100644 --- a/config/config2.example.yaml +++ b/config/config2.example.yaml @@ -4,6 +4,7 @@ llm: api_key: "YOUR_API_KEY" model: "gpt-4-turbo-preview" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview proxy: "YOUR_PROXY" # for LLM API requests + # timeout: 600 # Optional. If set to 0, default value is 300. pricing_plan: "" # Optional. If invalid, it will be automatically filled in with the value of the `model`. # Azure-exclusive pricing plan mappings: # - gpt-3.5-turbo 4k: "gpt-3.5-turbo-1106" @@ -12,6 +13,16 @@ llm: # - gpt-4 8k: "gpt-4" # See for more: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/ +# RAG Embedding. +# For backward compatibility, if the embedding is not set and the llm's api_type is either openai or azure, the llm's config will be used. +embedding: + api_type: "" # openai / azure / gemini / ollama etc. Check EmbeddingType for more options. + base_url: "" + api_key: "" + model: "" + api_version: "" + embed_batch_size: 100 + repair_llm_output: true # when the output is not a valid json, try to repair it proxy: "YOUR_PROXY" # for tools like requests, playwright, selenium, etc. diff --git a/config/config2.yaml b/config/config2.yaml index 8e5825b575..ba071e8040 100644 --- a/config/config2.yaml +++ b/config/config2.yaml @@ -2,6 +2,6 @@ # Reflected Code: https://github.com/geekan/MetaGPT/blob/main/metagpt/config2.py llm: api_type: "openai" # or azure / ollama / open_llm etc. Check LLMType for more options - model: "gpt-4-turbo-preview" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview + model: "gpt-4-turbo" # or gpt-3.5-turbo-1106 / gpt-4-1106-preview base_url: "https://api.openai.com/v1" # or forward url / other llm url api_key: "YOUR_API_KEY" \ No newline at end of file diff --git a/docs/README_CN.md b/docs/README_CN.md index 7a0db49747..8aea5e4cb1 100644 --- a/docs/README_CN.md +++ b/docs/README_CN.md @@ -116,7 +116,7 @@ https://github.com/geekan/MetaGPT/assets/34952977/34345016-5d13-489d-b9f9-b82ace ## 引用 -引用 [arXiv paper](https://arxiv.org/abs/2308.00352): +如果您在研究论文中使用 MetaGPT 或 Data Interpreter,请引用我们的工作: ```bibtex @misc{hong2023metagpt, @@ -127,4 +127,12 @@ https://github.com/geekan/MetaGPT/assets/34952977/34345016-5d13-489d-b9f9-b82ace archivePrefix={arXiv}, primaryClass={cs.AI} } +@misc{hong2024data, + title={Data Interpreter: An LLM Agent For Data Science}, + author={Sirui Hong and Yizhang Lin and Bang Liu and Bangbang Liu and Binhao Wu and Danyang Li and Jiaqi Chen and Jiayi Zhang and Jinlin Wang and Li Zhang and Lingyao Zhang and Min Yang and Mingchen Zhuge and Taicheng Guo and Tuo Zhou and Wei Tao and Wenyi Wang and Xiangru Tang and Xiangtao Lu and Xiawu Zheng and Xinbing Liang and Yaying Fei and Yuheng Cheng and Zongze Xu and Chenglin Wu}, + year={2024}, + eprint={2402.18679}, + archivePrefix={arXiv}, + primaryClass={cs.AI} +} ``` diff --git a/docs/README_JA.md b/docs/README_JA.md index c6b99461cb..91155532b8 100644 --- a/docs/README_JA.md +++ b/docs/README_JA.md @@ -295,7 +295,7 @@ Hugging Face Space で試す ## 引用 -現時点では、[arXiv 論文](https://arxiv.org/abs/2308.00352)を引用してください: +研究論文でMetaGPTやData Interpreterを使用する場合は、以下のように当社の作業を引用してください: ```bibtex @misc{hong2023metagpt, @@ -306,6 +306,14 @@ Hugging Face Space で試す archivePrefix={arXiv}, primaryClass={cs.AI} } +@misc{hong2024data, + title={Data Interpreter: An LLM Agent For Data Science}, + author={Sirui Hong and Yizhang Lin and Bang Liu and Bangbang Liu and Binhao Wu and Danyang Li and Jiaqi Chen and Jiayi Zhang and Jinlin Wang and Li Zhang and Lingyao Zhang and Min Yang and Mingchen Zhuge and Taicheng Guo and Tuo Zhou and Wei Tao and Wenyi Wang and Xiangru Tang and Xiangtao Lu and Xiawu Zheng and Xinbing Liang and Yaying Fei and Yuheng Cheng and Zongze Xu and Chenglin Wu}, + year={2024}, + eprint={2402.18679}, + archivePrefix={arXiv}, + primaryClass={cs.AI} +} ``` ## お問い合わせ先 diff --git a/examples/android_assistant/requirements.txt b/examples/android_assistant/requirements.txt new file mode 100644 index 0000000000..155863613c --- /dev/null +++ b/examples/android_assistant/requirements.txt @@ -0,0 +1,2 @@ +pyshine==0.0.9 +opencv-python==4.6.0.66 \ No newline at end of file diff --git a/examples/android_assistant/run_assistant.py b/examples/android_assistant/run_assistant.py new file mode 100644 index 0000000000..7d5d4d5c88 --- /dev/null +++ b/examples/android_assistant/run_assistant.py @@ -0,0 +1,71 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the entry of android assistant including learning and acting stage +# See the usage README inside `metagpt/ext/android_assistant` +# README see `metagpt/ext/android_assistant/README.md` + +import asyncio +from pathlib import Path + +import typer + +from metagpt.config2 import config +from metagpt.environment.android.android_env import AndroidEnv +from metagpt.ext.android_assistant.roles.android_assistant import AndroidAssistant +from metagpt.team import Team + +app = typer.Typer(add_completion=False, pretty_exceptions_show_locals=False) + + +@app.command("", help="Run a Android Assistant") +def startup( + task_desc: str = typer.Argument(help="the task description you want the android assistant to learn or act"), + n_round: int = typer.Option(default=20, help="The max round to do an app operation task."), + stage: str = typer.Option(default="learn", help="stage: learn / act"), + mode: str = typer.Option(default="auto", help="mode: auto / manual , when state=learn"), + app_name: str = typer.Option(default="demo", help="the name of app you want to run"), + investment: float = typer.Option(default=5.0, help="Dollar amount to invest in the AI company."), + refine_doc: bool = typer.Option( + default=False, help="Refine existing operation docs based on the latest observation if True." + ), + min_dist: int = typer.Option( + default=30, help="The minimum distance between elements to prevent overlapping during the labeling process." + ), + android_screenshot_dir: str = typer.Option( + default="/sdcard/Pictures/Screenshots", + help="The path to store screenshots on android device. Make sure it exists.", + ), + android_xml_dir: str = typer.Option( + default="/sdcard", + help="The path to store xml files for determining UI elements localtion. Make sure it exists.", + ), + device_id: str = typer.Option(default="emulator-5554", help="The Android device_id"), +): + config.extra = { + "stage": stage, + "mode": mode, + "app_name": app_name, + "task_desc": task_desc, + "refine_doc": refine_doc, + "min_dist": min_dist, + "android_screenshot_dir": android_screenshot_dir, + "android_xml_dir": android_xml_dir, + "device_id": device_id, + } + + team = Team( + env=AndroidEnv( + device_id=device_id, + xml_dir=Path(android_xml_dir), + screenshot_dir=Path(android_screenshot_dir), + ) + ) + + team.hire([AndroidAssistant(output_root_dir=Path(__file__).parent)]) + team.invest(investment) + team.run_project(idea=task_desc) + asyncio.run(team.run(n_round=n_round)) + + +if __name__ == "__main__": + app() diff --git a/examples/build_customized_agent.py b/examples/build_customized_agent.py index cfe264b47e..7dab4833d0 100644 --- a/examples/build_customized_agent.py +++ b/examples/build_customized_agent.py @@ -17,7 +17,7 @@ class SimpleWriteCode(Action): PROMPT_TEMPLATE: str = """ - Write a python function that can {instruction} and provide two runnnable test cases. + Write a python function that can {instruction} and provide two runnable test cases. Return ```python your_code_here ``` with NO other texts, your code: """ diff --git a/examples/data/rag/travel.txt b/examples/data/rag/travel.txt new file mode 100644 index 0000000000..f72ad5c59e --- /dev/null +++ b/examples/data/rag/travel.txt @@ -0,0 +1 @@ +Bob likes traveling. \ No newline at end of file diff --git a/examples/data/rag/writer.txt b/examples/data/rag/writer.txt new file mode 100644 index 0000000000..1dc055901e --- /dev/null +++ b/examples/data/rag/writer.txt @@ -0,0 +1,109 @@ +Productivity +I think I am at least somewhat more productive than average, and people sometimes ask me for productivity tips. So I decided to just write them all down in one place. + +Compound growth gets discussed as a financial concept, but it works in careers as well, and it is magic. A small productivity gain, compounded over 50 years, is worth a lot. So it’s worth figuring out how to optimize productivity. If you get 10% more done and 1% better every day compared to someone else, the compounded difference is massive. + +What you work on + +Famous writers have some essential qualities, creativity and discipline + +It doesn’t matter how fast you move if it’s in a worthless direction. Picking the right thing to work on is the most important element of productivity and usually almost ignored. So think about it more! Independent thought is hard but it’s something you can get better at with practice. + +The most impressive people I know have strong beliefs about the world, which is rare in the general population. If you find yourself always agreeing with whomever you last spoke with, that’s bad. You will of course be wrong sometimes, but develop the confidence to stick with your convictions. It will let you be courageous when you’re right about something important that most people don’t see. + +I make sure to leave enough time in my schedule to think about what to work on. The best ways for me to do this are reading books, hanging out with interesting people, and spending time in nature. + +I’ve learned that I can’t be very productive working on things I don’t care about or don’t like. So I just try not to put myself in a position where I have to do them (by delegating, avoiding, or something else). Stuff that you don’t like is a painful drag on morale and momentum. + +By the way, here is an important lesson about delegation: remember that everyone else is also most productive when they’re doing what they like, and do what you’d want other people to do for you—try to figure out who likes (and is good at) doing what, and delegate that way. + +If you find yourself not liking what you’re doing for a long period of time, seriously consider a major job change. Short-term burnout happens, but if it isn’t resolved with some time off, maybe it’s time to do something you’re more interested in. + +I’ve been very fortunate to find work I like so much I’d do it for free, which makes it easy to be really productive. + +It’s important to learn that you can learn anything you want, and that you can get better quickly. This feels like an unlikely miracle the first few times it happens, but eventually you learn to trust that you can do it. + +Doing great work usually requires colleagues of some sort. Try to be around smart, productive, happy, and positive people that don’t belittle your ambitions. I love being around people who push me and inspire me to be better. To the degree you able to, avoid the opposite kind of people—the cost of letting them take up your mental cycles is horrific. + +You have to both pick the right problem and do the work. There aren’t many shortcuts. If you’re going to do something really important, you are very likely going to work both smart and hard. The biggest prizes are heavily competed for. This isn’t true in every field (there are great mathematicians who never spend that many hours a week working) but it is in most. + +Prioritization + +Writers have to work hard to be successful + +My system has three key pillars: “Make sure to get the important shit done”, “Don’t waste time on stupid shit”, and “make a lot of lists”. + +I highly recommend using lists. I make lists of what I want to accomplish each year, each month, and each day. Lists are very focusing, and they help me with multitasking because I don’t have to keep as much in my head. If I’m not in the mood for some particular task, I can always find something else I’m excited to do. + +I prefer lists written down on paper. It’s easy to add and remove tasks. I can access them during meetings without feeling rude. I re-transcribe lists frequently, which forces me to think about everything on the list and gives me an opportunity to add and remove items. + +I don’t bother with categorization or trying to size tasks or anything like that (the most I do is put a star next to really important items). + +I try to prioritize in a way that generates momentum. The more I get done, the better I feel, and then the more I get done. I like to start and end each day with something I can really make progress on. + +I am relentless about getting my most important projects done—I’ve found that if I really want something to happen and I push hard enough, it usually happens. + +I try to be ruthless about saying no to stuff, and doing non-critical things in the quickest way possible. I probably take this too far—for example, I am almost sure I am terse to the point of rudeness when replying to emails. + +Passion and adaptability are key qualities to writers + +I generally try to avoid meetings and conferences as I find the time cost to be huge—I get the most value out of time in my office. However, it is critical that you keep enough space in your schedule to allow for chance encounters and exposure to new people and ideas. Having an open network is valuable; though probably 90% of the random meetings I take are a waste of time, the other 10% really make up for it. + +I find most meetings are best scheduled for 15-20 minutes, or 2 hours. The default of 1 hour is usually wrong, and leads to a lot of wasted time. + +I have different times of day I try to use for different kinds of work. The first few hours of the morning are definitely my most productive time of the day, so I don’t let anyone schedule anything then. I try to do meetings in the afternoon. I take a break, or switch tasks, whenever I feel my attention starting to fade. + +I don’t think most people value their time enough—I am surprised by the number of people I know who make $100 an hour and yet will spend a couple of hours doing something they don’t want to do to save $20. + +Also, don’t fall into the trap of productivity porn—chasing productivity for its own sake isn’t helpful. Many people spend too much time thinking about how to perfectly optimize their system, and not nearly enough asking if they’re working on the right problems. It doesn’t matter what system you use or if you squeeze out every second if you’re working on the wrong thing. + +The right goal is to allocate your year optimally, not your day. + +Physical factors + +Very likely what is optimal for me won’t be optimal for you. You’ll have to experiment to find out what works best for your body. It’s definitely worth doing—it helps in all aspects of life, and you’ll feel a lot better and happier overall. + +It probably took a little bit of my time every week for a few years to arrive at what works best for me, but my sense is if I do a good job at all the below I’m at least 1.5x more productive than if not. + +Sleep seems to be the most important physical factor in productivity for me. Some sort of sleep tracker to figure out how to sleep best is helpful. I’ve found the only thing I’m consistent with are in the set-it-and-forget-it category, and I really like the Emfit QS+Active. + +I like a cold, dark, quiet room, and a great mattress (I resisted spending a bunch of money on a great mattress for years, which was stupid—it makes a huge difference to my sleep quality. I love this one). Not eating a lot in the few hours before sleep helps. Not drinking alcohol helps a lot, though I’m not willing to do that all the time. + +I use a Chili Pad to be cold while I sleep if I can’t get the room cold enough, which is great but loud (I set it up to have the cooler unit outside my room). + +When traveling, I use an eye mask and ear plugs. + +Writers usually have empathy to write good books. + +This is likely to be controversial, but I take a low dose of sleeping pills (like a third of a normal dose) or a very low dose of cannabis whenever I can’t sleep. I am a bad sleeper in general, and a particularly bad sleeper when I travel. It likely has tradeoffs, but so does not sleeping well. If you can already sleep well, I wouldn’t recommend this. + +I use a full spectrum LED light most mornings for about 10-15 minutes while I catch up on email. It’s great—if you try nothing else in here, this is the thing I’d try. It’s a ridiculous gain for me. I like this one, and it’s easy to travel with. + +Exercise is probably the second most important physical factor. I tried a number of different exercise programs for a few months each and the one that seemed best was lifting heavy weights 3x a week for an hour, and high intensity interval training occasionally. In addition to productivity gains, this is also the exercise program that makes me feel the best overall. + +The third area is nutrition. I very rarely eat breakfast, so I get about 15 hours of fasting most days (except an espresso when I wake up). I know this is contrary to most advice, and I suspect it’s not optimal for most people, but it definitely works well for me. + +Eating lots of sugar is the thing that makes me feel the worst and that I try hardest to avoid. I also try to avoid foods that aggravate my digestion or spike up inflammation (for example, very spicy foods). I don’t have much willpower when it comes to sweet things, so I mostly just try to keep junk food out of the house. + +I have one big shot of espresso immediately when I wake up and one after lunch. I assume this is about 200mg total of caffeine per day. I tried a few other configurations; this was the one that worked by far the best. I otherwise aggressively avoid stimulants, but I will have more coffee if I’m super tired and really need to get something done. + +If a writer want to be super, then should include innovative thinking. + +I’m vegetarian and have been since I was a kid, and I supplement methyl B-12, Omega-3, Iron, and Vitamin D-3. I got to this list with a year or so of quarterly blood tests; it’s worked for me ever since (I re-test maybe every year and a half or so). There are many doctors who will happily work with you on a super comprehensive blood test (and services like WellnessFX). I also go out of my way to drink a lot of protein shakes, which I hate and I wouldn’t do if I weren’t vegetarian. + +Other stuff + +Here’s what I like in a workspace: natural light, quiet, knowing that I won’t be interrupted if I don’t want to be, long blocks of time, and being comfortable and relaxed (I’ve got a beautiful desk with a couple of 4k monitors on it in my office, but I spend almost all my time on my couch with my laptop). + +I wrote custom software for the annoying things I have to do frequently, which is great. I also made an effort to learn to type really fast and the keyboard shortcuts that help with my workflow. + +Like most people, I sometimes go through periods of a week or two where I just have no motivation to do anything (I suspect it may have something to do with nutrition). This sucks and always seems to happen at inconvenient times. I have not figured out what to do about it besides wait for the fog to lift, and to trust that eventually it always does. And I generally try to avoid people and situations that put me in bad moods, which is good advice whether you care about productivity or not. + +In general, I think it’s good to overcommit a little bit. I find that I generally get done what I take on, and if I have a little bit too much to do it makes me more efficient at everything, which is a way to train to avoid distractions (a great habit to build!). However, overcommitting a lot is disastrous. + +Don’t neglect your family and friends for the sake of productivity—that’s a very stupid tradeoff (and very likely a net productivity loss, because you’ll be less happy). Don’t neglect doing things you love or that clear your head either. + +Finally, to repeat one more time: productivity in the wrong direction isn’t worth anything at all. Think more about what to work on. + +Open-Mindedness and curiosity are essential to writers + diff --git a/examples/example.json b/examples/data/search_kb/example.json similarity index 100% rename from examples/example.json rename to examples/data/search_kb/example.json diff --git a/examples/example.xlsx b/examples/data/search_kb/example.xlsx similarity index 100% rename from examples/example.xlsx rename to examples/data/search_kb/example.xlsx diff --git a/examples/di/arxiv_reader.py b/examples/di/arxiv_reader.py new file mode 100644 index 0000000000..6e1939b816 --- /dev/null +++ b/examples/di/arxiv_reader.py @@ -0,0 +1,21 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +from metagpt.roles.di.data_interpreter import DataInterpreter + + +async def main(): + template = "https://arxiv.org/list/{tag}/pastweek?skip=0&show=300" + tags = ["cs.ai", "cs.cl", "cs.lg", "cs.se"] + urls = [template.format(tag=tag) for tag in tags] + prompt = f"""This is a collection of arxiv urls: '{urls}' . +Record each article, remove duplicates by title (they may have multiple tags), filter out papers related to +large language model / agent / llm, print top 100 and visualize the word count of the titles""" + di = DataInterpreter(react_mode="react", tools=["scrape_web_playwright"]) + + await di.run(prompt) + + +if __name__ == "__main__": + import asyncio + + asyncio.run(main()) diff --git a/examples/di/custom_tool.py b/examples/di/custom_tool.py new file mode 100644 index 0000000000..cbe7380c71 --- /dev/null +++ b/examples/di/custom_tool.py @@ -0,0 +1,36 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +@Time : 2024/3/22 10:54 +@Author : alexanderwu +@File : custom_tool.py +""" + +from metagpt.roles.di.data_interpreter import DataInterpreter +from metagpt.tools.tool_registry import register_tool + + +@register_tool() +def magic_function(arg1: str, arg2: int) -> dict: + """ + The magic function that does something. + + Args: + arg1 (str): ... + arg2 (int): ... + + Returns: + dict: ... + """ + return {"arg1": arg1 * 3, "arg2": arg2 * 5} + + +async def main(): + di = DataInterpreter(tools=["magic_function"]) + await di.run("Just call the magic function with arg1 'A' and arg2 2. Tell me the result.") + + +if __name__ == "__main__": + import asyncio + + asyncio.run(main()) diff --git a/examples/di/data_visualization.py b/examples/di/data_visualization.py index 1a21ab7cb9..184e04f266 100644 --- a/examples/di/data_visualization.py +++ b/examples/di/data_visualization.py @@ -1,14 +1,17 @@ import asyncio +from metagpt.logs import logger from metagpt.roles.di.data_interpreter import DataInterpreter +from metagpt.utils.recovery_util import save_history async def main(requirement: str = ""): di = DataInterpreter() - await di.run(requirement) + rsp = await di.run(requirement) + logger.info(rsp) + save_history(role=di) if __name__ == "__main__": requirement = "Run data analysis on sklearn Iris dataset, include a plot" - asyncio.run(main(requirement)) diff --git a/examples/rag_pipeline.py b/examples/rag_pipeline.py new file mode 100644 index 0000000000..1687d556b2 --- /dev/null +++ b/examples/rag_pipeline.py @@ -0,0 +1,247 @@ +"""RAG pipeline""" + +import asyncio + +from pydantic import BaseModel + +from metagpt.const import DATA_PATH, EXAMPLE_DATA_PATH +from metagpt.logs import logger +from metagpt.rag.engines import SimpleEngine +from metagpt.rag.schema import ( + ChromaIndexConfig, + ChromaRetrieverConfig, + ElasticsearchIndexConfig, + ElasticsearchRetrieverConfig, + ElasticsearchStoreConfig, + FAISSRetrieverConfig, + LLMRankerConfig, +) +from metagpt.utils.exceptions import handle_exception + +DOC_PATH = EXAMPLE_DATA_PATH / "rag/writer.txt" +QUESTION = "What are key qualities to be a good writer?" + +TRAVEL_DOC_PATH = EXAMPLE_DATA_PATH / "rag/travel.txt" +TRAVEL_QUESTION = "What does Bob like?" + +LLM_TIP = "If you not sure, just answer I don't know." + + +class Player(BaseModel): + """To demonstrate rag add objs.""" + + name: str = "" + goal: str = "Win The 100-meter Sprint." + tool: str = "Red Bull Energy Drink." + + def rag_key(self) -> str: + """For search""" + return self.goal + + +class RAGExample: + """Show how to use RAG.""" + + def __init__(self, engine: SimpleEngine = None): + self._engine = engine + + @property + def engine(self): + if not self._engine: + self._engine = SimpleEngine.from_docs( + input_files=[DOC_PATH], + retriever_configs=[FAISSRetrieverConfig()], + ranker_configs=[LLMRankerConfig()], + ) + return self._engine + + @engine.setter + def engine(self, value: SimpleEngine): + self._engine = value + + async def run_pipeline(self, question=QUESTION, print_title=True): + """This example run rag pipeline, use faiss retriever and llm ranker, will print something like: + + Retrieve Result: + 0. Productivi..., 10.0 + 1. I wrote cu..., 7.0 + 2. I highly r..., 5.0 + + Query Result: + Passion, adaptability, open-mindedness, creativity, discipline, and empathy are key qualities to be a good writer. + """ + if print_title: + self._print_title("Run Pipeline") + + nodes = await self.engine.aretrieve(question) + self._print_retrieve_result(nodes) + + answer = await self.engine.aquery(question) + self._print_query_result(answer) + + async def add_docs(self): + """This example show how to add docs. + + Before add docs llm anwser I don't know. + After add docs llm give the correct answer, will print something like: + + [Before add docs] + Retrieve Result: + + Query Result: + Empty Response + + [After add docs] + Retrieve Result: + 0. Bob like..., 10.0 + + Query Result: + Bob likes traveling. + """ + self._print_title("Add Docs") + + travel_question = f"{TRAVEL_QUESTION}{LLM_TIP}" + travel_filepath = TRAVEL_DOC_PATH + + logger.info("[Before add docs]") + await self.run_pipeline(question=travel_question, print_title=False) + + logger.info("[After add docs]") + self.engine.add_docs([travel_filepath]) + await self.run_pipeline(question=travel_question, print_title=False) + + @handle_exception + async def add_objects(self, print_title=True): + """This example show how to add objects. + + Before add docs, engine retrieve nothing. + After add objects, engine give the correct answer, will print something like: + + [Before add objs] + Retrieve Result: + + [After add objs] + Retrieve Result: + 0. 100m Sprin..., 10.0 + + [Object Detail] + {'name': 'Mike', 'goal': 'Win The 100-meter Sprint', 'tool': 'Red Bull Energy Drink'} + """ + if print_title: + self._print_title("Add Objects") + + player = Player(name="Mike") + question = f"{player.rag_key()}" + + logger.info("[Before add objs]") + await self._retrieve_and_print(question) + + logger.info("[After add objs]") + self.engine.add_objs([player]) + + try: + nodes = await self._retrieve_and_print(question) + + logger.info("[Object Detail]") + player: Player = nodes[0].metadata["obj"] + logger.info(player.name) + except Exception as e: + logger.error(f"nodes is empty, llm don't answer correctly, exception: {e}") + + async def init_objects(self): + """This example show how to from objs, will print something like: + + Same as add_objects. + """ + self._print_title("Init Objects") + + pre_engine = self.engine + self.engine = SimpleEngine.from_objs(retriever_configs=[FAISSRetrieverConfig()]) + await self.add_objects(print_title=False) + self.engine = pre_engine + + async def init_and_query_chromadb(self): + """This example show how to use chromadb. how to save and load index. will print something like: + + Query Result: + Bob likes traveling. + """ + self._print_title("Init And Query ChromaDB") + + # 1. save index + output_dir = DATA_PATH / "rag" + SimpleEngine.from_docs( + input_files=[TRAVEL_DOC_PATH], + retriever_configs=[ChromaRetrieverConfig(persist_path=output_dir)], + ) + + # 2. load index + engine = SimpleEngine.from_index(index_config=ChromaIndexConfig(persist_path=output_dir)) + + # 3. query + answer = await engine.aquery(TRAVEL_QUESTION) + self._print_query_result(answer) + + @handle_exception + async def init_and_query_es(self): + """This example show how to use es. how to save and load index. will print something like: + + Query Result: + Bob likes traveling. + """ + self._print_title("Init And Query Elasticsearch") + + # 1. create es index and save docs + store_config = ElasticsearchStoreConfig(index_name="travel", es_url="http://127.0.0.1:9200") + engine = SimpleEngine.from_docs( + input_files=[TRAVEL_DOC_PATH], + retriever_configs=[ElasticsearchRetrieverConfig(store_config=store_config)], + ) + + # 2. load index + engine = SimpleEngine.from_index(index_config=ElasticsearchIndexConfig(store_config=store_config)) + + # 3. query + answer = await engine.aquery(TRAVEL_QUESTION) + self._print_query_result(answer) + + @staticmethod + def _print_title(title): + logger.info(f"{'#'*30} {title} {'#'*30}") + + @staticmethod + def _print_retrieve_result(result): + """Print retrieve result.""" + logger.info("Retrieve Result:") + + for i, node in enumerate(result): + logger.info(f"{i}. {node.text[:10]}..., {node.score}") + + logger.info("") + + @staticmethod + def _print_query_result(result): + """Print query result.""" + logger.info("Query Result:") + + logger.info(f"{result}\n") + + async def _retrieve_and_print(self, question): + nodes = await self.engine.aretrieve(question) + self._print_retrieve_result(nodes) + return nodes + + +async def main(): + """RAG pipeline""" + e = RAGExample() + await e.run_pipeline() + await e.add_docs() + await e.add_objects() + await e.init_objects() + await e.init_and_query_chromadb() + await e.init_and_query_es() + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/examples/rag_search.py b/examples/rag_search.py new file mode 100644 index 0000000000..258c5ba60f --- /dev/null +++ b/examples/rag_search.py @@ -0,0 +1,21 @@ +"""Agent with RAG search.""" + +import asyncio + +from examples.rag_pipeline import DOC_PATH, QUESTION +from metagpt.logs import logger +from metagpt.rag.engines import SimpleEngine +from metagpt.roles import Sales + + +async def search(): + """Agent with RAG search.""" + + store = SimpleEngine.from_docs(input_files=[DOC_PATH]) + role = Sales(profile="Sales", store=store) + result = await role.run(QUESTION) + logger.info(result) + + +if __name__ == "__main__": + asyncio.run(search()) diff --git a/examples/search_kb.py b/examples/search_kb.py deleted file mode 100644 index 995720cc14..0000000000 --- a/examples/search_kb.py +++ /dev/null @@ -1,33 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -""" -@File : search_kb.py -@Modified By: mashenquan, 2023-12-22. Delete useless codes. -""" -import asyncio - -from langchain.embeddings import OpenAIEmbeddings - -from metagpt.config2 import config -from metagpt.const import DATA_PATH, EXAMPLE_PATH -from metagpt.document_store import FaissStore -from metagpt.logs import logger -from metagpt.roles import Sales - - -def get_store(): - llm = config.get_openai_llm() - embedding = OpenAIEmbeddings(openai_api_key=llm.api_key, openai_api_base=llm.base_url) - return FaissStore(DATA_PATH / "example.json", embedding=embedding) - - -async def search(): - store = FaissStore(EXAMPLE_PATH / "example.json") - role = Sales(profile="Sales", store=store) - query = "Which facial cleanser is good for oily skin?" - result = await role.run(query) - logger.info(result) - - -if __name__ == "__main__": - asyncio.run(search()) diff --git a/examples/search_with_specific_engine.py b/examples/search_with_specific_engine.py index 1eee762d5a..276431ed85 100644 --- a/examples/search_with_specific_engine.py +++ b/examples/search_with_specific_engine.py @@ -13,7 +13,7 @@ async def main(): question = "What are the most interesting human facts?" search = Config.default().search - kwargs = {"api_key": search.api_key, "cse_id": search.cse_id, "proxy": None} + kwargs = search.model_dump() await Searcher(search_engine=SearchEngine(engine=search.api_type, **kwargs)).run(question) diff --git a/metagpt/environment/android_env/__init__.py b/examples/stanford_town/__init__.py similarity index 100% rename from metagpt/environment/android_env/__init__.py rename to examples/stanford_town/__init__.py diff --git a/examples/stanford_town/requirements.txt b/examples/stanford_town/requirements.txt new file mode 100644 index 0000000000..e69de29bb2 diff --git a/examples/stanford_town/run_st_game.py b/examples/stanford_town/run_st_game.py new file mode 100644 index 0000000000..1a2d50f21e --- /dev/null +++ b/examples/stanford_town/run_st_game.py @@ -0,0 +1,94 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : entry of Stanford Town(ST/st) game +# README see `metagpt/ext/stanford_town/README.md` + +import asyncio +from typing import Optional + +import fire + +from metagpt.ext.stanford_town.roles.st_role import STRole +from metagpt.ext.stanford_town.stanford_town import StanfordTown +from metagpt.ext.stanford_town.utils.const import STORAGE_PATH +from metagpt.ext.stanford_town.utils.mg_ga_transform import ( + get_reverie_meta, + write_curr_sim_code, + write_curr_step, +) +from metagpt.ext.stanford_town.utils.utils import copy_folder +from metagpt.logs import logger + + +async def startup( + idea: str, fork_sim_code: str, sim_code: str, temp_storage_path: str, investment: float = 30.0, n_round: int = 500 +): + town = StanfordTown() + logger.info("StanfordTown init environment") + + # copy `storage/{fork_sim_code}` to `storage/{sim_code}` + copy_folder(str(STORAGE_PATH.joinpath(fork_sim_code)), str(STORAGE_PATH.joinpath(sim_code))) + + # get role names from `storage/{simulation_name}/reverie/meta.json` and then init roles + reverie_meta = get_reverie_meta(fork_sim_code) + roles = [] + sim_path = STORAGE_PATH.joinpath(sim_code) + sim_path.mkdir(exist_ok=True) + for idx, role_name in enumerate(reverie_meta["persona_names"]): + has_inner_voice = True if idx == 0 else False + role = STRole( + name=role_name, + profile=role_name, + sim_code=sim_code, + step=reverie_meta.get("step", 0), + start_time=reverie_meta.get("start_date"), + curr_time=reverie_meta.get("curr_time"), + sec_per_step=reverie_meta.get("sec_per_step"), + has_inner_voice=has_inner_voice, + ) + roles.append(role) + + # init temp_storage + write_curr_sim_code({"sim_code": sim_code}, temp_storage_path) + write_curr_step({"step": reverie_meta.get("step", 0)}, temp_storage_path) + + await town.hire(roles) + + town.invest(investment) + town.run_project(idea) + + await town.run(n_round) + + +def main( + idea: str, + fork_sim_code: str, + sim_code: str, + temp_storage_path: Optional[str] = None, + investment: float = 30.0, + n_round: int = 500, +): + """ + Args: + idea: idea works as an `inner voice` to the first agent. + fork_sim_code: old simulation name to start with, choose one inside `generative_agents/environment/frontend_server/storage/` + sim_code: new simulation name to save simulation result + temp_storage_path: generative_agents temp_storage path inside `environment/frontend_server` to interact. + investment: the investment of running agents + n_round: rounds to run agents + """ + + asyncio.run( + startup( + idea=idea, + fork_sim_code=fork_sim_code, + sim_code=sim_code, + temp_storage_path=temp_storage_path, + investment=investment, + n_round=n_round, + ) + ) + + +if __name__ == "__main__": + fire.Fire(main) diff --git a/examples/stanford_town/storage/.gitignore b/examples/stanford_town/storage/.gitignore new file mode 100644 index 0000000000..9628208614 --- /dev/null +++ b/examples/stanford_town/storage/.gitignore @@ -0,0 +1,4 @@ +# path to store simulation data +test_* +unittest* +July* \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/environment/0.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/environment/0.json new file mode 100644 index 0000000000..6eaa46c510 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/environment/0.json @@ -0,0 +1,26 @@ +{ + "Isabella Rodriguez": { + "maze": "the_ville", + "x": 72, + "y": 14 + }, + "Klaus Mueller": { + "maze": "the_ville", + "x": 126, + "y": 46 + }, + "Maria Lopez": { + "maze": "the_ville", + "x": 123, + "y": 57 + } +} + + + + + + + + + diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/embeddings.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/embeddings.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/embeddings.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/kw_strength.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/kw_strength.json new file mode 100644 index 0000000000..6dc73c1c85 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/kw_strength.json @@ -0,0 +1,2 @@ +{"kw_strength_event": {}, + "kw_strength_thought": {}} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/nodes.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/nodes.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/associative_memory/nodes.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/scratch.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/scratch.json new file mode 100644 index 0000000000..dbed4b705e --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/scratch.json @@ -0,0 +1,51 @@ +{ + "vision_r": 8, + "att_bandwidth": 8, + "retention": 8, + "curr_time": null, + "curr_tile": null, + "daily_plan_req": "Isabella Rodriguez opens Hobbs Cafe at 8am everyday, and works at the counter until 8pm, at which point she closes the cafe.", + "name": "Isabella Rodriguez", + "first_name": "Isabella", + "last_name": "Rodriguez", + "age": 34, + "innate": "friendly, outgoing, hospitable", + "learned": "Isabella Rodriguez is a cafe owner of Hobbs Cafe who loves to make people feel welcome. She is always looking for ways to make the cafe a place where people can come to relax and enjoy themselves.", + "currently": "Isabella Rodriguez is planning on having a Valentine's Day party at Hobbs Cafe with her customers on February 14th, 2023 at 5pm. She is gathering party material, and is telling everyone to join the party at Hobbs Cafe on February 14th, 2023, from 5pm to 7pm.", + "lifestyle": "Isabella Rodriguez goes to bed around 11pm, awakes up around 6am.", + "living_area": "the Ville:Isabella Rodriguez's apartment:main room", + "concept_forget": 100, + "daily_reflection_time": 180, + "daily_reflection_size": 5, + "overlap_reflect_th": 4, + "kw_strg_event_reflect_th": 10, + "kw_strg_thought_reflect_th": 9, + + "recency_w": 1, + "relevance_w": 1, + "importance_w": 1, + "recency_decay": 0.995, + "importance_trigger_max": 150, + "importance_trigger_curr": 150, + "importance_ele_n": 0, + "thought_count": 5, + + "daily_req": [], + "f_daily_schedule": [], + "f_daily_schedule_hourly_org": [], + "act_address": null, + "act_start_time": null, + "act_duration": null, + "act_description": null, + "act_pronunciatio": null, + "act_event": ["Isabella Rodriguez", null, null], + "act_obj_description": null, + "act_obj_pronunciatio": null, + "act_obj_event": [null, null, null], + "chatting_with": null, + "chat": null, + "chatting_with_buffer": {}, + "chatting_end_time": null, + "act_path_set": false, + "planned_path": [] +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/spatial_memory.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/spatial_memory.json new file mode 100644 index 0000000000..f881579508 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Isabella Rodriguez/bootstrap_memory/spatial_memory.json @@ -0,0 +1,66 @@ +{ + "the Ville": { + "Hobbs Cafe": { + "cafe": [ + "refrigerator", + "cafe customer seating", + "cooking area", + "kitchen sink", + "behind the cafe counter", + "piano" + ] + }, + "Isabella Rodriguez's apartment": { + "main room": [ + "bed", + "desk", + "refrigerator", + "closet", + "shelf" + ] + }, + "The Rose and Crown Pub": { + "pub": [ + "shelf", + "refrigerator", + "bar customer seating", + "behind the bar counter", + "kitchen sink", + "cooking area", + "microphone" + ] + }, + "Harvey Oak Supply Store": { + "supply store": [ + "supply store product shelf", + "behind the supply store counter", + "supply store counter" + ] + }, + "The Willows Market and Pharmacy": { + "store": [ + "behind the pharmacy counter", + "pharmacy store shelf", + "pharmacy store counter", + "grocery store shelf", + "behind the grocery counter", + "grocery store counter" + ] + }, + "Dorm for Oak Hill College": { + "garden": [ + "dorm garden" + ], + "common room": [ + "common room sofa", + "pool table", + "common room table" + ] + }, + "Johnson Park": { + "park": [ + "park garden" + ] + } + } +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/embeddings.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/embeddings.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/embeddings.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/kw_strength.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/kw_strength.json new file mode 100644 index 0000000000..6dc73c1c85 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/kw_strength.json @@ -0,0 +1,2 @@ +{"kw_strength_event": {}, + "kw_strength_thought": {}} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/nodes.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/nodes.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/associative_memory/nodes.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/scratch.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/scratch.json new file mode 100644 index 0000000000..7b0ce7d722 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/scratch.json @@ -0,0 +1,51 @@ +{ + "vision_r": 8, + "att_bandwidth": 8, + "retention": 8, + "curr_time": null, + "curr_tile": null, + "daily_plan_req": "Klaus Mueller goes to the library at Oak Hill College early in the morning, spends his days writing, and eats at Hobbs Cafe.", + "name": "Klaus Mueller", + "first_name": "Klaus", + "last_name": "Mueller", + "age": 20, + "innate": "kind, inquisitive, passionate", + "learned": "Klaus Mueller is a student at Oak Hill College studying sociology. He is passionate about social justice and loves to explore different perspectives.", + "currently": "Klaus Mueller is writing a research paper on the effects of gentrification in low-income communities.", + "lifestyle": "Klaus Mueller goes to bed around 11pm, awakes up around 7am, eats dinner around 5pm.", + "living_area": "the Ville:Dorm for Oak Hill College:Klaus Mueller's room", + "concept_forget": 100, + "daily_reflection_time": 180, + "daily_reflection_size": 5, + "overlap_reflect_th": 4, + "kw_strg_event_reflect_th": 10, + "kw_strg_thought_reflect_th": 9, + + "recency_w": 1, + "relevance_w": 1, + "importance_w": 1, + "recency_decay": 0.99, + "importance_trigger_max": 150, + "importance_trigger_curr": 150, + "importance_ele_n": 0, + "thought_count": 5, + + "daily_req": [], + "f_daily_schedule": [], + "f_daily_schedule_hourly_org": [], + "act_address": null, + "act_start_time": null, + "act_duration": null, + "act_description": null, + "act_pronunciatio": null, + "act_event": ["Klaus Mueller", null, null], + "act_obj_description": null, + "act_obj_pronunciatio": null, + "act_obj_event": [null, null, null], + "chatting_with": null, + "chat": null, + "chatting_with_buffer": {}, + "chatting_end_time": null, + "act_path_set": false, + "planned_path": [] +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/spatial_memory.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/spatial_memory.json new file mode 100644 index 0000000000..4f41686772 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Klaus Mueller/bootstrap_memory/spatial_memory.json @@ -0,0 +1,86 @@ +{ + "the Ville": { + "Oak Hill College": { + "hallway": [], + "library": [ + "library sofa", + "library table", + "bookshelf" + ], + "classroom": [ + "blackboard", + "classroom podium", + "classroom student seating" + ] + }, + "Dorm for Oak Hill College": { + "garden": [ + "dorm garden" + ], + "Klaus Mueller's room": [ + "bed", + "game console", + "closet", + "desk" + ], + "woman's bathroom": [ + "toilet", + "shower", + "bathroom sink" + ], + "common room": [ + "common room sofa", + "pool table", + "common room table" + ], + "man's bathroom": [ + "shower", + "bathroom sink", + "toilet" + ] + }, + "The Willows Market and Pharmacy": { + "store": [ + "grocery store shelf", + "behind the grocery counter", + "grocery store counter", + "pharmacy store shelf", + "pharmacy store counter", + "behind the pharmacy counter" + ] + }, + "Harvey Oak Supply Store": { + "supply store": [ + "supply store product shelf", + "behind the supply store counter", + "supply store counter" + ] + }, + "Johnson Park": { + "park": [ + "park garden" + ] + }, + "The Rose and Crown Pub": { + "pub": [ + "shelf", + "refrigerator", + "bar customer seating", + "behind the bar counter", + "kitchen sink", + "cooking area", + "microphone" + ] + }, + "Hobbs Cafe": { + "cafe": [ + "refrigerator", + "cafe customer seating", + "cooking area", + "kitchen sink", + "behind the cafe counter", + "piano" + ] + } + } +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/embeddings.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/embeddings.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/embeddings.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/kw_strength.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/kw_strength.json new file mode 100644 index 0000000000..6dc73c1c85 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/kw_strength.json @@ -0,0 +1,2 @@ +{"kw_strength_event": {}, + "kw_strength_thought": {}} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/nodes.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/nodes.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/associative_memory/nodes.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/scratch.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/scratch.json new file mode 100644 index 0000000000..c3a304952d --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/scratch.json @@ -0,0 +1,51 @@ +{ + "vision_r": 8, + "att_bandwidth": 8, + "retention": 8, + "curr_time": null, + "curr_tile": null, + "daily_plan_req": "Maria Lopez spends at least 3 hours a day Twitch streaming or gaming.", + "name": "Maria Lopez", + "first_name": "Maria", + "last_name": "Lopez", + "age": 21, + "innate": "energetic, enthusiastic, inquisitive", + "learned": "Maria Lopez is a student at Oak Hill College studying physics and a part time Twitch game streamer who loves to connect with people and explore new ideas.", + "currently": "Maria Lopez is working on her physics degree and streaming games on Twitch to make some extra money. She visits Hobbs Cafe for studying and eating just about everyday.", + "lifestyle": "Maria Lopez goes to bed around 2am, awakes up around 9am, eats dinner around 6pm. She likes to hang out at Hobbs Cafe if it's before 6pm.", + "living_area": "the Ville:Dorm for Oak Hill College:Maria Lopez's room", + "concept_forget": 100, + "daily_reflection_time": 180, + "daily_reflection_size": 5, + "overlap_reflect_th": 4, + "kw_strg_event_reflect_th": 10, + "kw_strg_thought_reflect_th": 9, + + "recency_w": 1, + "relevance_w": 1, + "importance_w": 1, + "recency_decay": 0.99, + "importance_trigger_max": 150, + "importance_trigger_curr": 150, + "importance_ele_n": 0, + "thought_count": 5, + + "daily_req": [], + "f_daily_schedule": [], + "f_daily_schedule_hourly_org": [], + "act_address": null, + "act_start_time": null, + "act_duration": null, + "act_description": null, + "act_pronunciatio": null, + "act_event": ["Maria Lopez", null, null], + "act_obj_description": null, + "act_obj_pronunciatio": null, + "act_obj_event": [null, null, null], + "chatting_with": null, + "chat": null, + "chatting_with_buffer": {}, + "chatting_end_time": null, + "act_path_set": false, + "planned_path": [] +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/spatial_memory.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/spatial_memory.json new file mode 100644 index 0000000000..0a58212bda --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/personas/Maria Lopez/bootstrap_memory/spatial_memory.json @@ -0,0 +1,87 @@ +{ + "the Ville": { + "Oak Hill College": { + "hallway": [], + "library": [ + "library sofa", + "library table", + "bookshelf" + ], + "classroom": [ + "blackboard", + "classroom podium", + "classroom student seating" + ] + }, + "Dorm for Oak Hill College": { + "garden": [ + "dorm garden" + ], + "Maria Lopez's room": [ + "closet", + "desk", + "bed", + "computer", + "blackboard" + ], + "woman's bathroom": [ + "toilet", + "shower", + "bathroom sink" + ], + "common room": [ + "common room sofa", + "pool table", + "common room table" + ], + "man's bathroom": [ + "shower", + "bathroom sink", + "toilet" + ] + }, + "The Willows Market and Pharmacy": { + "store": [ + "grocery store shelf", + "behind the grocery counter", + "grocery store counter", + "pharmacy store shelf", + "pharmacy store counter", + "behind the pharmacy counter" + ] + }, + "Harvey Oak Supply Store": { + "supply store": [ + "supply store product shelf", + "behind the supply store counter", + "supply store counter" + ] + }, + "Johnson Park": { + "park": [ + "park garden" + ] + }, + "The Rose and Crown Pub": { + "pub": [ + "shelf", + "refrigerator", + "bar customer seating", + "behind the bar counter", + "kitchen sink", + "cooking area", + "microphone" + ] + }, + "Hobbs Cafe": { + "cafe": [ + "refrigerator", + "cafe customer seating", + "cooking area", + "kitchen sink", + "behind the cafe counter", + "piano" + ] + } + } +} \ No newline at end of file diff --git a/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/reverie/meta.json b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/reverie/meta.json new file mode 100644 index 0000000000..1e81ec12d2 --- /dev/null +++ b/examples/stanford_town/storage/base_the_ville_isabella_maria_klaus/reverie/meta.json @@ -0,0 +1,13 @@ +{ + "fork_sim_code": "base_the_ville_isabella_maria_klaus", + "start_date": "February 13, 2023", + "curr_time": "February 13, 2023, 00:00:00", + "sec_per_step": 10, + "maze_name": "the_ville", + "persona_names": [ + "Isabella Rodriguez", + "Maria Lopez", + "Klaus Mueller" + ], + "step": 0 +} \ No newline at end of file diff --git a/examples/stream_output_via_api.py b/examples/stream_output_via_api.py new file mode 100644 index 0000000000..5961f3a082 --- /dev/null +++ b/examples/stream_output_via_api.py @@ -0,0 +1,117 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +@Time : 2024/3/27 9:44 +@Author : leiwu30 +@File : stream_output_via_api.py +@Description : Stream log information and communicate over the network via web api. +""" +import asyncio +import json +import socket +import threading +from contextvars import ContextVar + +from flask import Flask, Response, jsonify, request, send_from_directory + +from metagpt.const import TUTORIAL_PATH +from metagpt.logs import logger, set_llm_stream_logfunc +from metagpt.roles.tutorial_assistant import TutorialAssistant +from metagpt.utils.stream_pipe import StreamPipe + +app = Flask(__name__) + + +def stream_pipe_log(content): + print(content, end="") + stream_pipe = stream_pipe_var.get(None) + if stream_pipe: + stream_pipe.set_message(content) + + +def write_tutorial(message): + async def main(idea, stream_pipe): + stream_pipe_var.set(stream_pipe) + role = TutorialAssistant() + await role.run(idea) + + def thread_run(idea: str, stream_pipe: StreamPipe = None): + """ + Convert asynchronous function to thread function + """ + asyncio.run(main(idea, stream_pipe)) + + stream_pipe = StreamPipe() + thread = threading.Thread( + target=thread_run, + args=( + message["content"], + stream_pipe, + ), + ) + thread.start() + + while thread.is_alive(): + msg = stream_pipe.get_message() + yield stream_pipe.msg2stream(msg) + + +@app.route("/v1/chat/completions", methods=["POST"]) +def completions(): + """ + data: { + "model": "write_tutorial", + "stream": true, + "messages": [ + { + "role": "user", + "content": "Write a tutorial about MySQL" + } + ] + } + """ + + data = json.loads(request.data) + logger.info(json.dumps(data, indent=4, ensure_ascii=False)) + + # Non-streaming interfaces are not supported yet + stream_type = True if data.get("stream") else False + if not stream_type: + return jsonify({"status": 400, "msg": "Non-streaming requests are not supported, please use `stream=True`."}) + + # Only accept the last user information + # openai['model'] ~ MetaGPT['agent'] + last_message = data["messages"][-1] + model = data["model"] + + # write_tutorial + if model == "write_tutorial": + return Response(write_tutorial(last_message), mimetype="text/plain") + else: + return jsonify({"status": 400, "msg": "No suitable agent found."}) + + +@app.route("/download/") +def download_file(filename): + return send_from_directory(TUTORIAL_PATH, filename, as_attachment=True) + + +if __name__ == "__main__": + """ + curl https://$server_address:$server_port/v1/chat/completions -X POST -d '{ + "model": "write_tutorial", + "stream": true, + "messages": [ + { + "role": "user", + "content": "Write a tutorial about MySQL" + } + ] + }' + """ + server_port = 7860 + server_address = socket.gethostbyname(socket.gethostname()) + + set_llm_stream_logfunc(stream_pipe_log) + stream_pipe_var: ContextVar[StreamPipe] = ContextVar("stream_pipe") + app.run(port=server_port, host=server_address) diff --git a/examples/werewolf_game/evals/eval.py b/examples/werewolf_game/evals/eval.py new file mode 100644 index 0000000000..c890773de0 --- /dev/null +++ b/examples/werewolf_game/evals/eval.py @@ -0,0 +1,218 @@ +""" +Filename: MetaGPT/examples/werewolf_game/evals/eval.py +Created Date: Oct 18, 2023 +Updated Date: Oct 24, 2023 +Author: [Aria](https://github.com/ariafyy) +Info: eval the Voting Accuracy Rate of non_werewolves and Vote Difficulity +""" + +import glob +import os +import re +from pathlib import Path + +import numpy as np +import pandas as pd +from tqdm import tqdm +from utils import Utils + +from metagpt.const import DEFAULT_WORKSPACE_ROOT, METAGPT_ROOT +from metagpt.environment.werewolf.const import RoleType + + +class Vote: + """Vote Evaluation""" + + def __init__(self): + self.OUT_PATH = DEFAULT_WORKSPACE_ROOT / "outputs" + os.makedirs(self.OUT_PATH, exist_ok=True) + self.SUB_FOLDER_LIST = ["01-10", "11-20", "21-30"] + + def _get_log_fileslist(self, IN_PATH) -> list[str]: + files_list = [] + for SUB_FOLDER in self.SUB_FOLDER_LIST: + files_list.extend(glob.glob(str(IN_PATH / SUB_FOLDER / "*.txt"))) + return files_list + + def extract_votes_from_logs(self, files_list: list): + for in_logfile in tqdm(files_list): + SUB_FOLDER = (Path(in_logfile).parent).stem + out_txtfile = self.OUT_PATH / "# {0}_{1}.txt".format(SUB_FOLDER, Path(in_logfile).stem) + Utils().pick_vote_log(in_logfile, out_txtfile) + votefiles_list = Utils().get_file_list(self.OUT_PATH) + return votefiles_list + + @staticmethod + def parse_vote_text2chunks(text: str): + """ + parse each game vote log into text chunks + + one chunk example: + ['Player1', 'Player2', 'Player3', 'Player5', 'Player6']. Say ONLY: I vote to eliminate ... + Player1(Witch): 49 | I vote to eliminate Player5 + Player2(Villager): 49 | I vote to eliminate Player5 + Player3(Villager): 49 | I vote to eliminate Player5 + Player5(Werewolf): 49 | I vote to eliminate Player6 + Player6(Seer): 49 | I vote to eliminate Player5 + """ + pattern = re.compile(r"""\[([^\]]+)\]. Say ONLY: I vote to eliminate ...""") + chunks = {} + chunk_id = 0 + last_end = 0 + for match in pattern.finditer(text): + start = match.start() + chunk = text[last_end:start] + chunks[f"vote_{chunk_id}"] = chunk.strip() + last_end = match.end() + chunk_id += 1 + final_chunk = text[last_end:].strip() + if final_chunk: + chunks[f"vote_{chunk_id}"] = final_chunk + return chunks + + def _vote_rate_players(self, text: str): + """ + # calculate the rate of goodteam vote werewolves + :example: + + input: + ['Player1', 'Player2', 'Player3', 'Player5', 'Player6']. Say ONLY: I vote to eliminate ... + Player1(Witch): 49 | I vote to eliminate Player5 + Player2(Villager): 49 | I vote to eliminate Player5 + Player3(Villager): 49 | I vote to eliminate Player5 + Player5(Werewolf): 49 | I vote to eliminate Player6 + Player6(Seer): 49 | I vote to eliminate Player5 + + output: + werewolves: ['Player5'] + non_werewolves: ['Player1', 'Player2', 'Player3', 'Player6'] + as you can see :Player2(Villager) and Player3(Villager) vote to eliminate Player5(Werewolf) + :return goodteam vote rateability: 100.00% + """ + pattern = re.compile(r"(\w+)\(([^\)]+)\): \d+ \| I vote to eliminate (\w+)") + # find all werewolves + werewolves = [] + for match in pattern.finditer(text): + if match.group(2) == RoleType.WEREWOLF.value: + werewolves.append(match.group(1)) + + # find all non_werewolves + non_werewolves = [] + for match in pattern.finditer(text): + if match.group(2) != RoleType.WEREWOLF.value: + non_werewolves.append(match.group(1)) + num_non_werewolves = len(non_werewolves) + + # count players other than werewolves made the correct votes + correct_votes = 0 + for match in pattern.finditer(text): + if match.group(2) != RoleType.WEREWOLF.value and match.group(3) in werewolves: + correct_votes += 1 + + # cal the rateability of non_werewolves + rate = correct_votes / num_non_werewolves + good_vote_rate = round(rate, 2) + return {"good_vote_rate": good_vote_rate, "werewolves": werewolves, "non_werewolves": non_werewolves} + + def get_goodteam_vote_rate(self, text: str) -> float: + goodteam_vote_rate = self._vote_rate_players(text)["good_vote_rate"] + return goodteam_vote_rate + + def get_werewolves(self, text: str) -> list: + werewolves_list = self._vote_rate_players(text)["werewolves"] + return werewolves_list + + def get_non_werewolves(self, text: str) -> list: + non_werewolves_list = self._vote_rate_players(text)["non_werewolves"] + return non_werewolves_list + + def get_votewolf_difficulty(self, werewolves: list, non_werewolves: list) -> str: + num_living_wolfs = len(werewolves) + num_living_players = len(werewolves) + len(non_werewolves) + votewolf_difficulty = "_{0} / {1}".format(num_living_wolfs, num_living_players) + return votewolf_difficulty + + def get_result_df(self, out_txtfile: str) -> pd.DataFrame: + """ + folder: sub folders for evals + file: evaluation file, each file represents one game + votes: the number of votes, eg. vote_1 represents the first vote of this game, + good_vote_rate:the rateability of a good person voting against a werewolf, + correct_votes / the total number of players other than werewolves + total_votes:the total number of votes cast + """ + with open(out_txtfile, "r") as out_file: + text = out_file.read() + chunks = self.parse_vote_text2chunks(text) + res = [] + for k, v in chunks.items(): + if v != "": + chunks_list = list(chunks.keys()) + total_votes = len(chunks_list) - 1 + werewolves = self.get_werewolves(v) + non_werewolves = self.get_non_werewolves(v) + good_vote_rate = self.get_goodteam_vote_rate(v) + votewolf_difficulty = self.get_votewolf_difficulty(werewolves, non_werewolves) + folder = Utils().filename_to_foldername(out_txtfile) + result = { + "folder": folder, + "file": Path(out_txtfile).stem + ".txt", + "vote_round": k, + "good_vote_rate": good_vote_rate, + "total_votes": total_votes, + "votewolf_difficulty": votewolf_difficulty, + } + res.append(result) + df = pd.DataFrame(res) + return df + + def calc_avg_rate(self, IN_PATH) -> pd.DataFrame: + """ + get avg_rate for each game + avg_rate : the good_rate/total number of votes in the game + vote1_rate: First Round Voting Accuracy Rate + """ + infiles_list = self._get_log_fileslist(IN_PATH) + votefiles_list = self.extract_votes_from_logs(infiles_list) + df_list = [self._load_df_from_file(file) for file in votefiles_list] + combined_df = pd.concat(df_list, ignore_index=True) + # calculate the average good_vote_rate for each file + mean_rates = self._calculate_mean_rates(combined_df) + combined_df["avg_rate"] = combined_df["file"].map(mean_rates) + # calculate vote1 rate + vote1_rates = self._calc_vote1_rates(combined_df) + combined_df["vote1_rate"] = combined_df["folder"].map(vote1_rates.set_index("folder")["good_vote_rate"]) + combined_df.loc[combined_df["vote_round"] != "vote_1", "vote1_rate"] = np.nan + combined_df["vote1_rate"] = combined_df["vote1_rate"].apply(self._format_rates) + combined_df["good_vote_rate"] = combined_df["good_vote_rate"].apply(self._format_rates) + combined_df["avg_rate"] = combined_df["avg_rate"].apply(self._format_rates) + combined_df.sort_values(["file"], ascending=True, inplace=True) + return combined_df + + def _calc_vote1_rates(self, df): + df_vote1 = df[df["vote_round"] == "vote_1"] + vote1_rates = df_vote1.groupby("folder")["good_vote_rate"].mean().reset_index() + return vote1_rates + + def _load_df_from_file(self, file): + return self.get_result_df(file) + + def _calculate_mean_rates(self, df): + return df.groupby("file")["good_vote_rate"].mean() + + def _format_rates(self, s): + return Utils().float_to_percent(s) + + def get_eval_csv(self, IN_PATH, EVAL_RESULT): + """ + IN_PATH : parent folder of ["01-10", "11-20", "21-30"] + EVAL_RESULT : output csv file path + """ + combined_df = self.calc_avg_rate(IN_PATH) + combined_df.to_csv(EVAL_RESULT, index=False) + + +if __name__ == "__main__": + IN_PATH = METAGPT_ROOT / "examples/werewolf_game/evals" + EVAL_RESULT = DEFAULT_WORKSPACE_ROOT / "outputs" / "goodteam_vote_rate.csv" + Vote().get_eval_csv(IN_PATH, EVAL_RESULT) diff --git a/examples/werewolf_game/evals/utils.py b/examples/werewolf_game/evals/utils.py new file mode 100644 index 0000000000..490e7126fa --- /dev/null +++ b/examples/werewolf_game/evals/utils.py @@ -0,0 +1,134 @@ +""" +Filename: MetaGPT/examples/werewolf_game/evals/utils.py +Created Date: Oct 11, 2023 +Revised Date: Oct 20, 2023 +Author: [Aria](https://github.com/ariafyy) +""" +import glob +import os +import re +from pathlib import Path + +from metagpt.const import METAGPT_ROOT + + +class Utils: + """Utils: utils of logs""" + + @staticmethod + def polish_log(in_logfile, out_txtfile): + """polish logs for evaluation""" + pattern_text = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) \| (\w+) +\| ([\w\.]+:\w+:\d+) - (.*\S)" + pattern_player = r"(Player(\d{1}): \w+)" + pattern_start = False + json_start = False + + with open(in_logfile, "r") as f, open(out_txtfile, "w") as out: + for line in f.readlines(): + matches = re.match(pattern_text, line) + if matches: + message = matches.group(4).strip() + pattern_start = True + json_start = False + + if ( + "Moderator(Moderator) ready to InstructSpeak" not in message + and "Moderator(Moderator) ready to ParseSpeak" not in message + and "Total running cost:" not in message + ): + out.write("- " + message + "\n") + else: + out.write("\n") + + elif pattern_start and not matches: + if "gpt-4 may update over time" in line: + line = "" + out.write(line) + + elif line.strip().startswith("{"): + out.write(line.strip()) + json_start = True + + elif json_start and not line.strip().endswith("}"): + out.write(line.strip()) + + elif json_start and line.strip().endswith("}"): + out.write(line.strip()) + json_start = False + + elif ( + line.startswith("(User):") or line.startswith("********** STEP:") or re.search(pattern_player, line) + ): + out.write(line) + + else: + out.write("\n") + + @staticmethod + def pick_vote_log(in_logfile, out_txtfile): + """ + pick the vote log from the log file. + ready to AnnounceGameResult serves as the 'HINT_TEXT ' which indicates the end of the game. + based on bservation and reflection, then discuss is not in vote session. + """ + pattern_vote = r"(Player\d+)\(([A-Za-z]+)\): (\d+) \| (I vote to eliminate Player\d+)" + ignore_text = """reflection""" + HINT_TEXT = r"ready to AnnounceGameResult" + pattern_moderator = r"\[([^\]]+)\]\. Say ONLY: I vote to eliminate ..." + in_valid_block = False + + with open(in_logfile, "r") as f: + lines = f.read() + split_lines = lines.split(HINT_TEXT) + + if len(split_lines) < 2: + print(f"Key text :{HINT_TEXT} not found in {in_logfile}") + return + + relevant_lines = split_lines[1].split("\n") + with open(out_txtfile, "w") as out: + for line in relevant_lines: + if re.search(pattern_moderator, line): + in_valid_block = True + out.write(line.lstrip() + "\n") + + elif in_valid_block and re.search(pattern_vote, line): + out.write(line + "\n") + elif ignore_text in line: + in_valid_block = False + + @staticmethod + def get_file_list(path: str) -> list: + file_pattern = os.path.join(path, "*.txt") + files_list = glob.glob(file_pattern) + return files_list + + @staticmethod + def filename_to_foldername(out_txtfile: str): + """ + convert filename into its parent folder name + input:"....../# 01-10_10132100.txt" + output:# 01-10 + """ + s = Path(out_txtfile).stem + pattern_folder = r"([^_]*)_" + match = re.match(pattern_folder, s) + if match: + folder = match.group(1) + return folder + + @staticmethod + def float_to_percent(decimal: float) -> str: + """ + input: 1.00 + output: 100.00% + """ + percent = decimal * 100 + return f"{percent:.2f}%" + + +if __name__ == "__main__": + in_logfile = METAGPT_ROOT / "logs/log.txt" + out_txtfile = "input your wish path" + # Utils().polish_log(in_logfile, out_txtfile) + Utils().pick_vote_log(in_logfile, out_txtfile) diff --git a/examples/werewolf_game/start_game.py b/examples/werewolf_game/start_game.py new file mode 100644 index 0000000000..fe31c6c559 --- /dev/null +++ b/examples/werewolf_game/start_game.py @@ -0,0 +1,68 @@ +import asyncio + +import fire + +from metagpt.ext.werewolf.roles import Guard, Moderator, Seer, Villager, Werewolf, Witch +from metagpt.ext.werewolf.roles.human_player import prepare_human_player +from metagpt.ext.werewolf.werewolf_game import WerewolfGame +from metagpt.logs import logger + + +async def start_game( + investment: float = 3.0, + n_round: int = 5, + shuffle: bool = True, + add_human: bool = False, + use_reflection: bool = True, + use_experience: bool = False, + use_memory_selection: bool = False, + new_experience_version: str = "", +): + game = WerewolfGame() + game_setup, players = game.env.init_game_setup( + role_uniq_objs=[Villager, Werewolf, Guard, Seer, Witch], + num_werewolf=2, + num_villager=2, + shuffle=shuffle, + add_human=add_human, + use_reflection=use_reflection, + use_experience=use_experience, + use_memory_selection=use_memory_selection, + new_experience_version=new_experience_version, + prepare_human_player=prepare_human_player, + ) + logger.info(f"{game_setup}") + + players = [Moderator()] + players + game.hire(players) + game.invest(investment) + game.run_project(game_setup) + await game.run(n_round=n_round) + + +def main( + investment: float = 20.0, + n_round: int = 100, + shuffle: bool = True, + add_human: bool = False, + use_reflection: bool = True, + use_experience: bool = False, + use_memory_selection: bool = False, + new_experience_version: str = "", +): + asyncio.run( + start_game( + investment, + n_round, + shuffle, + add_human, + use_reflection, + use_experience, + use_memory_selection, + new_experience_version, + ) + ) + + +if __name__ == "__main__": + fire.Fire(main) diff --git a/metagpt/actions/action_node.py b/metagpt/actions/action_node.py index 09da4a9887..07638ce423 100644 --- a/metagpt/actions/action_node.py +++ b/metagpt/actions/action_node.py @@ -17,6 +17,7 @@ from tenacity import retry, stop_after_attempt, wait_random_exponential from metagpt.actions.action_outcls_registry import register_action_outcls +from metagpt.const import USE_CONFIG_TIMEOUT from metagpt.llm import BaseLLM from metagpt.logs import logger from metagpt.provider.postprocess.llm_output_postprocess import llm_output_postprocess @@ -330,7 +331,7 @@ def keys(self, mode: str = "auto") -> list: def compile_to(self, i: Dict, schema, kv_sep) -> str: if schema == "json": - return json.dumps(i, indent=4) + return json.dumps(i, indent=4, ensure_ascii=False) elif schema == "markdown": return dict_to_markdown(i, kv_sep=kv_sep) else: @@ -339,10 +340,7 @@ def compile_to(self, i: Dict, schema, kv_sep) -> str: def tagging(self, text, schema, tag="") -> str: if not tag: return text - if schema == "json": - return f"[{tag}]\n" + text + f"\n[/{tag}]" - else: # markdown - return f"[{tag}]\n" + text + f"\n[/{tag}]" + return f"[{tag}]\n{text}\n[/{tag}]" def _compile_f(self, schema, mode, tag, format_func, kv_sep, exclude=None) -> str: nodes = self.to_dict(format_func=format_func, mode=mode, exclude=exclude) @@ -374,7 +372,7 @@ def compile(self, context, schema="json", mode="children", template=SIMPLE_TEMPL schema="markdown": 编译context, example(markdown), instruction(markdown), constraint, action """ if schema == "raw": - return context + "\n\n## Actions\n" + LANGUAGE_CONSTRAINT + "\n" + self.instruction + return f"{context}\n\n## Actions\n{LANGUAGE_CONSTRAINT}\n{self.instruction}" ### 直接使用 pydantic BaseModel 生成 instruction 与 example,仅限 JSON # child_class = self._create_children_class() @@ -416,7 +414,7 @@ async def _aask_v1( images: Optional[Union[str, list[str]]] = None, system_msgs: Optional[list[str]] = None, schema="markdown", # compatible to original format - timeout=3, + timeout=USE_CONFIG_TIMEOUT, ) -> (str, BaseModel): """Use ActionOutput to wrap the output of aask""" content = await self.llm.aask(prompt, system_msgs, images=images, timeout=timeout) @@ -448,9 +446,10 @@ def set_llm(self, llm): def set_context(self, context): self.set_recursive("context", context) - async def simple_fill(self, schema, mode, images: Optional[Union[str, list[str]]] = None, timeout=3, exclude=None): + async def simple_fill( + self, schema, mode, images: Optional[Union[str, list[str]]] = None, timeout=USE_CONFIG_TIMEOUT, exclude=None + ): prompt = self.compile(context=self.context, schema=schema, mode=mode, exclude=exclude) - if schema != "raw": mapping = self.get_mapping(mode, exclude=exclude) class_name = f"{self.key}_AN" @@ -473,7 +472,7 @@ async def fill( mode="auto", strgy="simple", images: Optional[Union[str, list[str]]] = None, - timeout=3, + timeout=USE_CONFIG_TIMEOUT, exclude=[], ): """Fill the node(s) with mode. diff --git a/metagpt/actions/di/write_analysis_code.py b/metagpt/actions/di/write_analysis_code.py index 185926e311..711e56d39b 100644 --- a/metagpt/actions/di/write_analysis_code.py +++ b/metagpt/actions/di/write_analysis_code.py @@ -18,7 +18,7 @@ STRUCTUAL_PROMPT, ) from metagpt.schema import Message, Plan -from metagpt.utils.common import CodeParser, process_message, remove_comments +from metagpt.utils.common import CodeParser, remove_comments class WriteAnalysisCode(Action): @@ -50,7 +50,7 @@ async def run( ) working_memory = working_memory or [] - context = process_message([Message(content=structual_prompt, role="user")] + working_memory) + context = self.llm.format_msg([Message(content=structual_prompt, role="user")] + working_memory) # LLM call if use_reflection: diff --git a/metagpt/actions/rebuild_sequence_view.py b/metagpt/actions/rebuild_sequence_view.py index 0e67de9086..2aac9bf203 100644 --- a/metagpt/actions/rebuild_sequence_view.py +++ b/metagpt/actions/rebuild_sequence_view.py @@ -486,7 +486,7 @@ def parse_participant(mermaid_sequence_diagram: str) -> List[str]: Returns: List[str]: A list of participants extracted from the sequence diagram. """ - pattern = r"participant ([a-zA-Z\.0-9_]+)" + pattern = r"participant ([\w\.]+)" matches = re.findall(pattern, mermaid_sequence_diagram) matches = [re.sub(r"[\\/'\"]+", "", i) for i in matches] return matches diff --git a/metagpt/actions/research.py b/metagpt/actions/research.py index ce8d8a9677..2a99a8d99e 100644 --- a/metagpt/actions/research.py +++ b/metagpt/actions/research.py @@ -134,7 +134,7 @@ def gen_msg(): break model_name = config.llm.model - prompt = reduce_message_length(gen_msg(), model_name, system_text, 4096) + prompt = reduce_message_length(gen_msg(), model_name, system_text, config.llm.max_token) logger.debug(prompt) queries = await self._aask(prompt, [system_text]) try: diff --git a/metagpt/actions/write_code_plan_and_change_an.py b/metagpt/actions/write_code_plan_and_change_an.py index f99bffd849..a909469816 100644 --- a/metagpt/actions/write_code_plan_and_change_an.py +++ b/metagpt/actions/write_code_plan_and_change_an.py @@ -128,6 +128,9 @@ def add_numbers(): ## User New Requirements {requirement} +## Issue +{issue} + ## PRD {prd} @@ -211,7 +214,8 @@ async def run(self, *args, **kwargs): design_doc = await self.repo.docs.system_design.get(filename=self.i_context.design_filename) task_doc = await self.repo.docs.task.get(filename=self.i_context.task_filename) context = CODE_PLAN_AND_CHANGE_CONTEXT.format( - requirement=self.i_context.requirement, + requirement=f"```text\n{self.i_context.requirement}\n```", + issue=f"```text\n{self.i_context.issue}\n```", prd=prd_doc.content, design=design_doc.content, task=task_doc.content, diff --git a/metagpt/actions/write_prd_an.py b/metagpt/actions/write_prd_an.py index 5733b29da7..6a995e1840 100644 --- a/metagpt/actions/write_prd_an.py +++ b/metagpt/actions/write_prd_an.py @@ -133,10 +133,10 @@ REFINED_REQUIREMENT_ANALYSIS = ActionNode( key="Refined Requirement Analysis", expected_type=List[str], - instruction="Review and refine the existing requirement analysis to align with the evolving needs of the project " + instruction="Review and refine the existing requirement analysis into a string list to align with the evolving needs of the project " "due to incremental development. Ensure the analysis comprehensively covers the new features and enhancements " "required for the refined project scope.", - example=["Require add/update/modify ..."], + example=["Require add ...", "Require modify ..."], ) REQUIREMENT_POOL = ActionNode( diff --git a/metagpt/config2.py b/metagpt/config2.py index bc6af18c62..58a99c920b 100644 --- a/metagpt/config2.py +++ b/metagpt/config2.py @@ -12,6 +12,7 @@ from pydantic import BaseModel, model_validator from metagpt.configs.browser_config import BrowserConfig +from metagpt.configs.embedding_config import EmbeddingConfig from metagpt.configs.llm_config import LLMConfig, LLMType from metagpt.configs.mermaid_config import MermaidConfig from metagpt.configs.redis_config import RedisConfig @@ -47,6 +48,9 @@ class Config(CLIParams, YamlModel): # Key Parameters llm: LLMConfig + # RAG Embedding + embedding: EmbeddingConfig = EmbeddingConfig() + # Global Proxy. Will be used if llm.proxy is not set proxy: str = "" @@ -75,6 +79,7 @@ class Config(CLIParams, YamlModel): iflytek_api_key: str = "" azure_tts_subscription_key: str = "" azure_tts_region: str = "" + _extra: dict = dict() # extra config dict @classmethod def from_home(cls, path): @@ -92,7 +97,7 @@ def default(cls): """ default_config_paths: List[Path] = [ METAGPT_ROOT / "config/config2.yaml", - Path.home() / ".metagpt/config2.yaml", + CONFIG_ROOT / "config2.yaml", ] dicts = [dict(os.environ)] @@ -100,6 +105,20 @@ def default(cls): final = merge_dict(dicts) return Config(**final) + @classmethod + def from_llm_config(cls, llm_config: dict): + """user config llm + example: + llm_config = {"api_type": "xxx", "api_key": "xxx", "model": "xxx"} + gpt4 = Config.from_llm_config(llm_config) + A = Role(name="A", profile="Democratic candidate", goal="Win the election", actions=[a1], watch=[a2], config=gpt4) + """ + llm_config = LLMConfig.model_validate(llm_config) + dicts = [dict(os.environ)] + dicts += [{"llm": llm_config}] + final = merge_dict(dicts) + return Config(**final) + def update_via_cli(self, project_path, project_name, inc, reqa_file, max_auto_summarize_code): """update config via cli""" @@ -113,6 +132,14 @@ def update_via_cli(self, project_path, project_name, inc, reqa_file, max_auto_su self.reqa_file = reqa_file self.max_auto_summarize_code = max_auto_summarize_code + @property + def extra(self): + return self._extra + + @extra.setter + def extra(self, value: dict): + self._extra = value + def get_openai_llm(self) -> Optional[LLMConfig]: """Get OpenAI LLMConfig by name. If no OpenAI, raise Exception""" if self.llm.api_type == LLMType.OPENAI: diff --git a/metagpt/configs/embedding_config.py b/metagpt/configs/embedding_config.py new file mode 100644 index 0000000000..20de479996 --- /dev/null +++ b/metagpt/configs/embedding_config.py @@ -0,0 +1,50 @@ +from enum import Enum +from typing import Optional + +from pydantic import field_validator + +from metagpt.utils.yaml_model import YamlModel + + +class EmbeddingType(Enum): + OPENAI = "openai" + AZURE = "azure" + GEMINI = "gemini" + OLLAMA = "ollama" + + +class EmbeddingConfig(YamlModel): + """Config for Embedding. + + Examples: + --------- + api_type: "openai" + api_key: "YOU_API_KEY" + + api_type: "azure" + api_key: "YOU_API_KEY" + base_url: "YOU_BASE_URL" + api_version: "YOU_API_VERSION" + + api_type: "gemini" + api_key: "YOU_API_KEY" + + api_type: "ollama" + base_url: "YOU_BASE_URL" + model: "YOU_MODEL" + """ + + api_type: Optional[EmbeddingType] = None + api_key: Optional[str] = None + base_url: Optional[str] = None + api_version: Optional[str] = None + + model: Optional[str] = None + embed_batch_size: Optional[int] = None + + @field_validator("api_type", mode="before") + @classmethod + def check_api_type(cls, v): + if v == "": + return None + return v diff --git a/metagpt/configs/llm_config.py b/metagpt/configs/llm_config.py index 77bfc8466a..af8f56372f 100644 --- a/metagpt/configs/llm_config.py +++ b/metagpt/configs/llm_config.py @@ -10,6 +10,7 @@ from pydantic import field_validator +from metagpt.const import LLM_API_TIMEOUT from metagpt.utils.yaml_model import YamlModel @@ -29,6 +30,7 @@ class LLMType(Enum): DASHSCOPE = "dashscope" # Aliyun LingJi DashScope MOONSHOT = "moonshot" MISTRAL = "mistral" + YI = "yi" # lingyiwanwu def __missing__(self, key): return self.OPENAI @@ -73,7 +75,7 @@ class LLMConfig(YamlModel): stream: bool = False logprobs: Optional[bool] = None # https://cookbook.openai.com/examples/using_logprobs top_logprobs: Optional[int] = None - timeout: int = 60 + timeout: int = 600 # For Network proxy: Optional[str] = None @@ -87,3 +89,8 @@ def check_llm_key(cls, v): if v in ["", None, "YOUR_API_KEY"]: raise ValueError("Please set your API key in config2.yaml") return v + + @field_validator("timeout") + @classmethod + def check_timeout(cls, v): + return v or LLM_API_TIMEOUT diff --git a/metagpt/configs/mermaid_config.py b/metagpt/configs/mermaid_config.py index 50c8a18475..47f14f4cd0 100644 --- a/metagpt/configs/mermaid_config.py +++ b/metagpt/configs/mermaid_config.py @@ -13,7 +13,7 @@ class MermaidConfig(YamlModel): """Config for Mermaid""" - engine: Literal["nodejs", "ink", "playwright", "pyppeteer"] = "nodejs" + engine: Literal["nodejs", "ink", "playwright", "pyppeteer", "none"] = "nodejs" path: str = "mmdc" # mmdc puppeteer_config: str = "" pyppeteer_path: str = "/usr/bin/google-chrome-stable" diff --git a/metagpt/configs/search_config.py b/metagpt/configs/search_config.py index af928b02a1..e28b14c994 100644 --- a/metagpt/configs/search_config.py +++ b/metagpt/configs/search_config.py @@ -7,6 +7,8 @@ """ from typing import Callable, Optional +from pydantic import Field + from metagpt.tools import SearchEngineType from metagpt.utils.yaml_model import YamlModel @@ -18,3 +20,11 @@ class SearchConfig(YamlModel): api_key: str = "" cse_id: str = "" # for google search_func: Optional[Callable] = None + params: dict = Field( + default_factory=lambda: { + "engine": "google", + "google_domain": "google.com", + "gl": "us", + "hl": "en", + } + ) diff --git a/metagpt/const.py b/metagpt/const.py index a5e3ea9c2f..e4cebfd96c 100644 --- a/metagpt/const.py +++ b/metagpt/const.py @@ -49,6 +49,7 @@ def get_metagpt_root(): DEFAULT_WORKSPACE_ROOT = METAGPT_ROOT / "workspace" EXAMPLE_PATH = METAGPT_ROOT / "examples" +EXAMPLE_DATA_PATH = EXAMPLE_PATH / "data" DATA_PATH = METAGPT_ROOT / "data" TEST_DATA_PATH = METAGPT_ROOT / "tests/data" RESEARCH_PATH = DATA_PATH / "research" @@ -122,7 +123,6 @@ def get_metagpt_root(): # REDIS REDIS_KEY = "REDIS_KEY" -LLM_API_TIMEOUT = 300 # Message id IGNORED_MESSAGE_ID = "0" @@ -131,3 +131,7 @@ def get_metagpt_root(): GENERALIZATION = "Generalize" COMPOSITION = "Composite" AGGREGATION = "Aggregate" + +# Timeout +USE_CONFIG_TIMEOUT = 0 # Using llm.timeout configuration. +LLM_API_TIMEOUT = 300 diff --git a/metagpt/context.py b/metagpt/context.py index 0add4c71ae..2bd5412026 100644 --- a/metagpt/context.py +++ b/metagpt/context.py @@ -7,7 +7,7 @@ """ import os from pathlib import Path -from typing import Any, Optional +from typing import Any, Dict, Optional from pydantic import BaseModel, ConfigDict @@ -78,12 +78,6 @@ def new_environ(self): # env.update({k: v for k, v in i.items() if isinstance(v, str)}) return env - # def use_llm(self, name: Optional[str] = None, provider: LLMType = LLMType.OPENAI) -> BaseLLM: - # """Use a LLM instance""" - # self._llm_config = self.config.get_llm_config(name, provider) - # self._llm = None - # return self._llm - def _select_costmanager(self, llm_config: LLMConfig) -> CostManager: """Return a CostManager instance""" if llm_config.api_type == LLMType.FIREWORKS: @@ -108,3 +102,38 @@ def llm_with_cost_manager_from_llm_config(self, llm_config: LLMConfig) -> BaseLL if llm.cost_manager is None: llm.cost_manager = self._select_costmanager(llm_config) return llm + + def serialize(self) -> Dict[str, Any]: + """Serialize the object's attributes into a dictionary. + + Returns: + Dict[str, Any]: A dictionary containing serialized data. + """ + return { + "workdir": str(self.repo.workdir) if self.repo else "", + "kwargs": {k: v for k, v in self.kwargs.__dict__.items()}, + "cost_manager": self.cost_manager.model_dump_json(), + } + + def deserialize(self, serialized_data: Dict[str, Any]): + """Deserialize the given serialized data and update the object's attributes accordingly. + + Args: + serialized_data (Dict[str, Any]): A dictionary containing serialized data. + """ + if not serialized_data: + return + workdir = serialized_data.get("workdir") + if workdir: + self.git_repo = GitRepository(local_path=workdir, auto_init=True) + self.repo = ProjectRepo(self.git_repo) + src_workspace = self.git_repo.workdir / self.git_repo.workdir.name + if src_workspace.exists(): + self.src_workspace = src_workspace + kwargs = serialized_data.get("kwargs") + if kwargs: + for k, v in kwargs.items(): + self.kwargs.set(k, v) + cost_manager = serialized_data.get("cost_manager") + if cost_manager: + self.cost_manager.model_validate_json(cost_manager) diff --git a/metagpt/document.py b/metagpt/document.py index 1a3c4b2f8b..4a8bb68d5c 100644 --- a/metagpt/document.py +++ b/metagpt/document.py @@ -11,12 +11,9 @@ from typing import Optional, Union import pandas as pd -from langchain.text_splitter import CharacterTextSplitter -from langchain_community.document_loaders import ( - TextLoader, - UnstructuredPDFLoader, - UnstructuredWordDocumentLoader, -) +from llama_index.core import Document, SimpleDirectoryReader +from llama_index.core.node_parser import SimpleNodeParser +from llama_index.readers.file import PDFReader from pydantic import BaseModel, ConfigDict, Field from tqdm import tqdm @@ -29,7 +26,7 @@ def validate_cols(content_col: str, df: pd.DataFrame): raise ValueError("Content column not found in DataFrame.") -def read_data(data_path: Path): +def read_data(data_path: Path) -> Union[pd.DataFrame, list[Document]]: suffix = data_path.suffix if ".xlsx" == suffix: data = pd.read_excel(data_path) @@ -38,14 +35,13 @@ def read_data(data_path: Path): elif ".json" == suffix: data = pd.read_json(data_path) elif suffix in (".docx", ".doc"): - data = UnstructuredWordDocumentLoader(str(data_path), mode="elements").load() + data = SimpleDirectoryReader(input_files=[str(data_path)]).load_data() elif ".txt" == suffix: - data = TextLoader(str(data_path)).load() - text_splitter = CharacterTextSplitter(separator="\n", chunk_size=256, chunk_overlap=0) - texts = text_splitter.split_documents(data) - data = texts + data = SimpleDirectoryReader(input_files=[str(data_path)]).load_data() + node_parser = SimpleNodeParser.from_defaults(separator="\n", chunk_size=256, chunk_overlap=0) + data = node_parser.get_nodes_from_documents(data) elif ".pdf" == suffix: - data = UnstructuredPDFLoader(str(data_path), mode="elements").load() + data = PDFReader.load_data(str(data_path)) else: raise NotImplementedError("File format not supported.") return data @@ -150,9 +146,9 @@ def _get_docs_and_metadatas_by_df(self) -> (list, list): metadatas.append({}) return docs, metadatas - def _get_docs_and_metadatas_by_langchain(self) -> (list, list): + def _get_docs_and_metadatas_by_llamaindex(self) -> (list, list): data = self.data - docs = [i.page_content for i in data] + docs = [i.text for i in data] metadatas = [i.metadata for i in data] return docs, metadatas @@ -160,7 +156,7 @@ def get_docs_and_metadatas(self) -> (list, list): if isinstance(self.data, pd.DataFrame): return self._get_docs_and_metadatas_by_df() elif isinstance(self.data, list): - return self._get_docs_and_metadatas_by_langchain() + return self._get_docs_and_metadatas_by_llamaindex() else: raise NotImplementedError("Data type not supported for metadata extraction.") diff --git a/metagpt/document_store/base_store.py b/metagpt/document_store/base_store.py index ddc1d626b0..6aafc57bb0 100644 --- a/metagpt/document_store/base_store.py +++ b/metagpt/document_store/base_store.py @@ -38,9 +38,9 @@ def __init__(self, raw_data_path: Path, cache_dir: Path = None): if not self.store: self.store = self.write() - def _get_index_and_store_fname(self, index_ext=".index", pkl_ext=".pkl"): - index_file = self.cache_dir / f"{self.fname}{index_ext}" - store_file = self.cache_dir / f"{self.fname}{pkl_ext}" + def _get_index_and_store_fname(self, index_ext=".json", docstore_ext=".json"): + index_file = self.cache_dir / "default__vector_store" / index_ext + store_file = self.cache_dir / "docstore" / docstore_ext return index_file, store_file @abstractmethod diff --git a/metagpt/document_store/chromadb_store.py b/metagpt/document_store/chromadb_store.py index d7344d41ba..1d3a014ee6 100644 --- a/metagpt/document_store/chromadb_store.py +++ b/metagpt/document_store/chromadb_store.py @@ -11,9 +11,9 @@ class ChromaStore: """If inherited from BaseStore, or importing other modules from metagpt, a Python exception occurs, which is strange.""" - def __init__(self, name): + def __init__(self, name: str, get_or_create: bool = False): client = chromadb.Client() - collection = client.create_collection(name) + collection = client.create_collection(name, get_or_create=get_or_create) self.client = client self.collection = collection diff --git a/metagpt/document_store/faiss_store.py b/metagpt/document_store/faiss_store.py index 2359917d5d..b196bef270 100644 --- a/metagpt/document_store/faiss_store.py +++ b/metagpt/document_store/faiss_store.py @@ -7,10 +7,14 @@ """ import asyncio from pathlib import Path -from typing import Optional +from typing import Any, Optional -from langchain.vectorstores import FAISS -from langchain_core.embeddings import Embeddings +import faiss +from llama_index.core import VectorStoreIndex, load_index_from_storage +from llama_index.core.embeddings import BaseEmbedding +from llama_index.core.schema import Document, QueryBundle, TextNode +from llama_index.core.storage import StorageContext +from llama_index.vector_stores.faiss import FaissVectorStore from metagpt.document import IndexableDocument from metagpt.document_store.base_store import LocalStore @@ -20,36 +24,50 @@ class FaissStore(LocalStore): def __init__( - self, raw_data: Path, cache_dir=None, meta_col="source", content_col="output", embedding: Embeddings = None + self, raw_data: Path, cache_dir=None, meta_col="source", content_col="output", embedding: BaseEmbedding = None ): self.meta_col = meta_col self.content_col = content_col self.embedding = embedding or get_embedding() + self.store: VectorStoreIndex super().__init__(raw_data, cache_dir) - def _load(self) -> Optional["FaissStore"]: - index_file, store_file = self._get_index_and_store_fname(index_ext=".faiss") # langchain FAISS using .faiss + def _load(self) -> Optional["VectorStoreIndex"]: + index_file, store_file = self._get_index_and_store_fname() if not (index_file.exists() and store_file.exists()): logger.info("Missing at least one of index_file/store_file, load failed and return None") return None + vector_store = FaissVectorStore.from_persist_dir(persist_dir=self.cache_dir) + storage_context = StorageContext.from_defaults(persist_dir=self.cache_dir, vector_store=vector_store) + index = load_index_from_storage(storage_context, embed_model=self.embedding) - return FAISS.load_local(self.raw_data_path.parent, self.embedding, self.fname) + return index - def _write(self, docs, metadatas): - store = FAISS.from_texts(docs, self.embedding, metadatas=metadatas) - return store + def _write(self, docs: list[str], metadatas: list[dict[str, Any]]) -> VectorStoreIndex: + assert len(docs) == len(metadatas) + documents = [Document(text=doc, metadata=metadatas[idx]) for idx, doc in enumerate(docs)] + + vector_store = FaissVectorStore(faiss_index=faiss.IndexFlatL2(1536)) + storage_context = StorageContext.from_defaults(vector_store=vector_store) + index = VectorStoreIndex.from_documents( + documents=documents, storage_context=storage_context, embed_model=self.embedding + ) + + return index def persist(self): - self.store.save_local(self.raw_data_path.parent, self.fname) + self.store.storage_context.persist(self.cache_dir) + + def search(self, query: str, expand_cols=False, sep="\n", *args, k=5, **kwargs): + retriever = self.store.as_retriever(similarity_top_k=k) + rsp = retriever.retrieve(QueryBundle(query_str=query, embedding=self.embedding.get_text_embedding(query))) - def search(self, query, expand_cols=False, sep="\n", *args, k=5, **kwargs): - rsp = self.store.similarity_search(query, k=k, **kwargs) logger.debug(rsp) if expand_cols: - return str(sep.join([f"{x.page_content}: {x.metadata}" for x in rsp])) + return str(sep.join([f"{x.node.text}: {x.node.metadata}" for x in rsp])) else: - return str(sep.join([f"{x.page_content}" for x in rsp])) + return str(sep.join([f"{x.node.text}" for x in rsp])) async def asearch(self, *args, **kwargs): return await asyncio.to_thread(self.search, *args, **kwargs) @@ -67,8 +85,12 @@ def write(self): def add(self, texts: list[str], *args, **kwargs) -> list[str]: """FIXME: Currently, the store is not updated after adding.""" - return self.store.add_texts(texts) + texts_embeds = self.embedding.get_text_embedding_batch(texts) + nodes = [TextNode(text=texts[idx], embedding=embed) for idx, embed in enumerate(texts_embeds)] + self.store.insert_nodes(nodes) + + return [] def delete(self, *args, **kwargs): - """Currently, langchain does not provide a delete interface.""" + """Currently, faiss does not provide a delete interface.""" raise NotImplementedError diff --git a/metagpt/environment/README.md b/metagpt/environment/README.md index 9476ac75ac..bb7d50d501 100644 --- a/metagpt/environment/README.md +++ b/metagpt/environment/README.md @@ -21,7 +21,7 @@ observe & step inside role's actions from metagpt.environment.api.env_api import EnvAPIAbstract # get screenshot from ExtEnv -screenshot_path: Path = env.observe( +screenshot_path: Path = await env.observe( EnvAPIAbstract( api_name="get_screenshot", kwargs={"ss_name": f"{round_count}_before", "local_save_dir": task_dir} ) @@ -34,5 +34,5 @@ res = env.step(EnvAPIAbstract("system_tap", kwargs={"x": x, "y": y})) ## TODO - add android app operation assistant under `examples/android_assistant` - migrate roles/actions of werewolf game from old version into current version -- migrate roles/actions of mincraft game from old version into current version +- migrate roles/actions of minecraft game from old version into current version - migrate roles/actions of stanford_town game from old version into current version diff --git a/metagpt/environment/__init__.py b/metagpt/environment/__init__.py index 692672fa7d..0621fff165 100644 --- a/metagpt/environment/__init__.py +++ b/metagpt/environment/__init__.py @@ -3,11 +3,10 @@ # @Desc : from metagpt.environment.base_env import Environment -from metagpt.environment.android_env.android_env import AndroidEnv -from metagpt.environment.mincraft_env.mincraft_env import MincraftExtEnv -from metagpt.environment.werewolf_env.werewolf_env import WerewolfEnv -from metagpt.environment.stanford_town_env.stanford_town_env import StanfordTownEnv -from metagpt.environment.software_env.software_env import SoftwareEnv +from metagpt.environment.android.android_env import AndroidEnv +from metagpt.environment.werewolf.werewolf_env import WerewolfEnv +from metagpt.environment.stanford_town.stanford_town_env import StanfordTownEnv +from metagpt.environment.software.software_env import SoftwareEnv -__all__ = ["AndroidEnv", "MincraftExtEnv", "WerewolfEnv", "StanfordTownEnv", "SoftwareEnv", "Environment"] +__all__ = ["AndroidEnv", "WerewolfEnv", "StanfordTownEnv", "SoftwareEnv", "Environment"] diff --git a/metagpt/environment/mincraft_env/__init__.py b/metagpt/environment/android/__init__.py similarity index 100% rename from metagpt/environment/mincraft_env/__init__.py rename to metagpt/environment/android/__init__.py diff --git a/metagpt/environment/android_env/android_env.py b/metagpt/environment/android/android_env.py similarity index 59% rename from metagpt/environment/android_env/android_env.py rename to metagpt/environment/android/android_env.py index c27e20541e..66672d219e 100644 --- a/metagpt/environment/android_env/android_env.py +++ b/metagpt/environment/android/android_env.py @@ -4,10 +4,12 @@ from pydantic import Field -from metagpt.environment.android_env.android_ext_env import AndroidExtEnv +from metagpt.environment.android.android_ext_env import AndroidExtEnv from metagpt.environment.base_env import Environment -class AndroidEnv(Environment, AndroidExtEnv): +class AndroidEnv(AndroidExtEnv, Environment): + """in order to use actual `reset`&`observe`, inherited order: AndroidExtEnv, Environment""" + rows: int = Field(default=0, description="rows of a grid on the screenshot") cols: int = Field(default=0, description="cols of a grid on the screenshot") diff --git a/metagpt/environment/android_env/android_ext_env.py b/metagpt/environment/android/android_ext_env.py similarity index 66% rename from metagpt/environment/android_env/android_ext_env.py rename to metagpt/environment/android/android_ext_env.py index b81b2cd262..826846e4fb 100644 --- a/metagpt/environment/android_env/android_ext_env.py +++ b/metagpt/environment/android/android_ext_env.py @@ -8,7 +8,14 @@ from pydantic import Field -from metagpt.environment.android_env.const import ADB_EXEC_FAIL +from metagpt.environment.android.const import ADB_EXEC_FAIL +from metagpt.environment.android.env_space import ( + EnvAction, + EnvActionType, + EnvObsParams, + EnvObsType, + EnvObsValType, +) from metagpt.environment.base_env import ExtEnv, mark_as_readable, mark_as_writeable @@ -21,11 +28,70 @@ class AndroidExtEnv(ExtEnv): def __init__(self, **data: Any): super().__init__(**data) - if data.get("device_id"): + device_id = data.get("device_id") + if device_id: + devices = self.list_devices() + if device_id not in devices: + raise RuntimeError(f"device-id: {device_id} not found") (width, height) = self.device_shape self.width = data.get("width", width) self.height = data.get("height", height) + self.create_device_path(self.screenshot_dir) + self.create_device_path(self.xml_dir) + + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, Any], dict[str, Any]]: + super().reset(seed=seed, options=options) + + obs = self._get_obs() + + return obs, {} + + def _get_obs(self) -> dict[str, EnvObsValType]: + pass + + def observe(self, obs_params: Optional[EnvObsParams] = None) -> Any: + obs_type = obs_params.obs_type if obs_params else EnvObsType.NONE + if obs_type == EnvObsType.NONE: + pass + elif obs_type == EnvObsType.GET_SCREENSHOT: + obs = self.get_screenshot(ss_name=obs_params.ss_name, local_save_dir=obs_params.local_save_dir) + elif obs_type == EnvObsType.GET_XML: + obs = self.get_xml(xml_name=obs_params.xml_name, local_save_dir=obs_params.local_save_dir) + return obs + + def step(self, action: EnvAction) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]: + res = self._execute_env_action(action) + + obs = {} + + ret = (obs, 1.0, False, False, {"res": res}) + return ret + + def _execute_env_action(self, action: EnvAction): + action_type = action.action_type + res = None + if action_type == EnvActionType.NONE: + pass + elif action_type == EnvActionType.SYSTEM_BACK: + res = self.system_back() + elif action_type == EnvActionType.SYSTEM_TAP: + res = self.system_tap(x=action.coord[0], y=action.coord[1]) + elif action_type == EnvActionType.USER_INPUT: + res = self.user_input(input_txt=action.input_txt) + elif action_type == EnvActionType.USER_LONGPRESS: + res = self.user_longpress(x=action.coord[0], y=action.coord[1]) + elif action_type == EnvActionType.USER_SWIPE: + res = self.user_swipe(x=action.coord[0], y=action.coord[1], orient=action.orient, dist=action.dist) + elif action_type == EnvActionType.USER_SWIPE_TO: + res = self.user_swipe_to(start=action.coord, end=action.tgt_coord) + return res + @property def adb_prefix_si(self): """adb cmd prefix with `device_id` and `shell input`""" @@ -42,12 +108,19 @@ def adb_prefix(self): return f"adb -s {self.device_id} " def execute_adb_with_cmd(self, adb_cmd: str) -> str: + adb_cmd = adb_cmd.replace("\\", "/") res = subprocess.run(adb_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) exec_res = ADB_EXEC_FAIL if not res.returncode: exec_res = res.stdout.strip() return exec_res + def create_device_path(self, folder_path: Path): + adb_cmd = f"{self.adb_prefix_shell} mkdir {folder_path} -p" + res = self.execute_adb_with_cmd(adb_cmd) + if res == ADB_EXEC_FAIL: + raise RuntimeError(f"create device path: {folder_path} failed") + @property def device_shape(self) -> tuple[int, int]: adb_cmd = f"{self.adb_prefix_shell} wm size" diff --git a/metagpt/environment/android_env/const.py b/metagpt/environment/android/const.py similarity index 100% rename from metagpt/environment/android_env/const.py rename to metagpt/environment/android/const.py diff --git a/metagpt/environment/android/env_space.py b/metagpt/environment/android/env_space.py new file mode 100644 index 0000000000..9580e3a7d3 --- /dev/null +++ b/metagpt/environment/android/env_space.py @@ -0,0 +1,92 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from pathlib import Path +from typing import Union + +import numpy as np +import numpy.typing as npt +from gymnasium import spaces +from pydantic import ConfigDict, Field, field_validator + +from metagpt.environment.base_env_space import ( + BaseEnvAction, + BaseEnvActionType, + BaseEnvObsParams, + BaseEnvObsType, +) + + +class EnvActionType(BaseEnvActionType): + NONE = 0 # no action to run, just get observation + + SYSTEM_BACK = 1 + SYSTEM_TAP = 2 + USER_INPUT = 3 + USER_LONGPRESS = 4 + USER_SWIPE = 5 + USER_SWIPE_TO = 6 + + +class EnvAction(BaseEnvAction): + model_config = ConfigDict(arbitrary_types_allowed=True) + + action_type: int = Field(default=EnvActionType.NONE, description="action type") + coord: npt.NDArray[np.int64] = Field( + default_factory=lambda: np.zeros(2, dtype=np.int64), description="operation coordinate" + ) + tgt_coord: npt.NDArray[np.int64] = Field( + default_factory=lambda: np.zeros(2, dtype=np.int64), description="target operation coordinate" + ) + input_txt: str = Field(default="", description="user input text") + orient: str = Field(default="up", description="swipe orient") + dist: str = Field(default="medium", description="swipe dist") + + @field_validator("coord", "tgt_coord", mode="before") + @classmethod + def check_coord(cls, coord) -> npt.NDArray[np.int64]: + if not isinstance(coord, np.ndarray): + return np.array(coord) + + +class EnvObsType(BaseEnvObsType): + NONE = 0 # get whole observation from env + + GET_SCREENSHOT = 1 + GET_XML = 2 + + +class EnvObsParams(BaseEnvObsParams): + model_config = ConfigDict(arbitrary_types_allowed=True) + + obs_type: int = Field(default=EnvObsType.NONE, description="observation type") + ss_name: str = Field(default="", description="screenshot file name") + xml_name: str = Field(default="", description="xml file name") + local_save_dir: Union[str, Path] = Field(default="", description="local dir to save file") + + +EnvObsValType = str + + +def get_observation_space() -> spaces.Dict: + space = spaces.Dict({"screenshot": spaces.Text(256), "xml": spaces.Text(256)}) + return space + + +def get_action_space(device_shape: tuple[int, int]) -> spaces.Dict: + space = spaces.Dict( + { + "action_type": spaces.Discrete(len(EnvActionType)), + "coord": spaces.Box( + np.array([0, 0], dtype=np.int64), np.array([device_shape[0], device_shape[1]], dtype=np.int64) + ), + "tgt_coord": spaces.Box( + np.array([0, 0], dtype=np.int64), np.array([device_shape[0], device_shape[1]], dtype=np.int64) + ), + "input_txt": spaces.Text(256), + "orient": spaces.Text(16), + "dist": spaces.Text(16), + } + ) + return space diff --git a/metagpt/environment/api/env_api.py b/metagpt/environment/api/env_api.py index 1e6df544d9..924f6b1041 100644 --- a/metagpt/environment/api/env_api.py +++ b/metagpt/environment/api/env_api.py @@ -18,11 +18,11 @@ class EnvAPIAbstract(BaseModel): class EnvAPIRegistry(BaseModel): """the registry to store environment w&r api/interface""" - registry: dict[str, dict[str, Union[dict, Any, str]]] = Field(default=dict(), exclude=True) + registry: dict[str, Callable] = Field(default=dict(), exclude=True) def get(self, api_name: str): if api_name not in self.registry: - raise ValueError + raise KeyError(f"api_name: {api_name} not found") return self.registry.get(api_name) def __getitem__(self, api_name: str) -> Callable: diff --git a/metagpt/environment/base_env.py b/metagpt/environment/base_env.py index 0e583ffb34..024c468776 100644 --- a/metagpt/environment/base_env.py +++ b/metagpt/environment/base_env.py @@ -3,9 +3,12 @@ # @Desc : base env of executing environment import asyncio +from abc import abstractmethod from enum import Enum from typing import TYPE_CHECKING, Any, Dict, Iterable, Optional, Set, Union +from gymnasium import spaces +from gymnasium.core import ActType, ObsType from pydantic import BaseModel, ConfigDict, Field, SerializeAsAny, model_validator from metagpt.context import Context @@ -14,6 +17,7 @@ ReadAPIRegistry, WriteAPIRegistry, ) +from metagpt.environment.base_env_space import BaseEnvAction, BaseEnvObsParams from metagpt.logs import logger from metagpt.schema import Message from metagpt.utils.common import get_function_schema, is_coroutine_func, is_send_to @@ -26,7 +30,7 @@ class EnvType(Enum): ANDROID = "Android" GYM = "Gym" WEREWOLF = "Werewolf" - MINCRAFT = "Mincraft" + MINECRAFT = "Minecraft" STANFORDTOWN = "StanfordTown" @@ -47,7 +51,12 @@ def mark_as_writeable(func): class ExtEnv(BaseModel): - """External Env to intergate actual game environment""" + """External Env to integrate actual game environment""" + + model_config = ConfigDict(arbitrary_types_allowed=True) + + action_space: spaces.Space[ActType] = Field(default_factory=spaces.Space, exclude=True) + observation_space: spaces.Space[ObsType] = Field(default_factory=spaces.Space, exclude=True) def _check_api_exist(self, rw_api: Optional[str] = None): if not rw_api: @@ -61,39 +70,56 @@ def get_all_available_apis(self, mode: str = "read") -> list[Any]: else: return env_write_api_registry.get_apis() - async def observe(self, env_action: Union[str, EnvAPIAbstract]): + async def read_from_api(self, env_action: Union[str, EnvAPIAbstract]): """get observation from particular api of ExtEnv""" if isinstance(env_action, str): - read_api = env_read_api_registry.get(api_name=env_action)["func"] - self._check_api_exist(read_api) - if is_coroutine_func(read_api): - res = await read_api(self) + env_read_api = env_read_api_registry.get(api_name=env_action)["func"] + self._check_api_exist(env_read_api) + if is_coroutine_func(env_read_api): + res = await env_read_api(self) else: - res = read_api(self) + res = env_read_api(self) elif isinstance(env_action, EnvAPIAbstract): - read_api = env_read_api_registry.get(api_name=env_action.api_name)["func"] - self._check_api_exist(read_api) - if is_coroutine_func(read_api): - res = await read_api(self, *env_action.args, **env_action.kwargs) + env_read_api = env_read_api_registry.get(api_name=env_action.api_name)["func"] + self._check_api_exist(env_read_api) + if is_coroutine_func(env_read_api): + res = await env_read_api(self, *env_action.args, **env_action.kwargs) else: - res = read_api(self, *env_action.args, **env_action.kwargs) + res = env_read_api(self, *env_action.args, **env_action.kwargs) return res - async def step(self, env_action: Union[str, Message, EnvAPIAbstract, list[EnvAPIAbstract]]): + async def write_thru_api(self, env_action: Union[str, Message, EnvAPIAbstract, list[EnvAPIAbstract]]): """execute through particular api of ExtEnv""" res = None if isinstance(env_action, Message): self.publish_message(env_action) elif isinstance(env_action, EnvAPIAbstract): - write_api = env_write_api_registry.get(env_action.api_name)["func"] - self._check_api_exist(write_api) - if is_coroutine_func(write_api): - res = await write_api(self, *env_action.args, **env_action.kwargs) + env_write_api = env_write_api_registry.get(env_action.api_name)["func"] + self._check_api_exist(env_write_api) + if is_coroutine_func(env_write_api): + res = await env_write_api(self, *env_action.args, **env_action.kwargs) else: - res = write_api(self, *env_action.args, **env_action.kwargs) + res = env_write_api(self, *env_action.args, **env_action.kwargs) return res + @abstractmethod + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, Any], dict[str, Any]]: + """Implement this to get init observation""" + + @abstractmethod + def observe(self, obs_params: Optional[BaseEnvObsParams] = None) -> Any: + """Implement this if you want to get partial observation from the env""" + + @abstractmethod + def step(self, action: BaseEnvAction) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]: + """Implement this to feed a action and then get new observation from the env""" + class Environment(ExtEnv): """环境,承载一批角色,角色可以向环境发布消息,可以被其他角色观察到 @@ -108,6 +134,20 @@ class Environment(ExtEnv): history: str = "" # For debug context: Context = Field(default_factory=Context, exclude=True) + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, Any], dict[str, Any]]: + pass + + def observe(self, obs_params: Optional[BaseEnvObsParams] = None) -> Any: + pass + + def step(self, action: BaseEnvAction) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]: + pass + @model_validator(mode="after") def init_roles(self): self.add_roles(self.roles.values()) @@ -129,8 +169,8 @@ def add_roles(self, roles: Iterable["Role"]): self.roles[role.profile] = role for role in roles: # setup system message with roles - role.set_env(self) role.context = self.context + role.set_env(self) def publish_message(self, message: Message, peekable: bool = True) -> bool: """ diff --git a/metagpt/environment/base_env_space.py b/metagpt/environment/base_env_space.py new file mode 100644 index 0000000000..fd0cfa399f --- /dev/null +++ b/metagpt/environment/base_env_space.py @@ -0,0 +1,33 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from enum import IntEnum + +from pydantic import BaseModel, ConfigDict, Field + + +class BaseEnvActionType(IntEnum): + # # NONE = 0 # no action to run, just get observation + pass + + +class BaseEnvAction(BaseModel): + """env action type and its related params of action functions/apis""" + + model_config = ConfigDict(arbitrary_types_allowed=True) + + action_type: int = Field(default=0, description="action type") + + +class BaseEnvObsType(IntEnum): + # # NONE = 0 # get whole observation from env + pass + + +class BaseEnvObsParams(BaseModel): + """observation params for different EnvObsType to get its observe result""" + + model_config = ConfigDict(arbitrary_types_allowed=True) + + obs_type: int = Field(default=0, description="observation type") diff --git a/metagpt/environment/software_env/__init__.py b/metagpt/environment/minecraft/__init__.py similarity index 100% rename from metagpt/environment/software_env/__init__.py rename to metagpt/environment/minecraft/__init__.py diff --git a/metagpt/environment/mincraft_env/const.py b/metagpt/environment/minecraft/const.py similarity index 92% rename from metagpt/environment/mincraft_env/const.py rename to metagpt/environment/minecraft/const.py index a7222f9cd4..8ac15decc8 100644 --- a/metagpt/environment/mincraft_env/const.py +++ b/metagpt/environment/minecraft/const.py @@ -4,8 +4,8 @@ from metagpt.const import METAGPT_ROOT -# For Mincraft Game Agent -MC_CKPT_DIR = METAGPT_ROOT / "data/mincraft/ckpt" +# For Minecraft Game Agent +MC_CKPT_DIR = METAGPT_ROOT / "data/minecraft/ckpt" MC_LOG_DIR = METAGPT_ROOT / "logs" MC_DEFAULT_WARMUP = { "context": 15, diff --git a/metagpt/environment/mincraft_env/mincraft_env.py b/metagpt/environment/minecraft/minecraft_env.py similarity index 93% rename from metagpt/environment/mincraft_env/mincraft_env.py rename to metagpt/environment/minecraft/minecraft_env.py index 6327aa3f4b..31a48964bb 100644 --- a/metagpt/environment/mincraft_env/mincraft_env.py +++ b/metagpt/environment/minecraft/minecraft_env.py @@ -1,6 +1,6 @@ #!/usr/bin/env python # -*- coding: utf-8 -*- -# @Desc : MG Mincraft Env +# @Desc : MG Minecraft Env # refs to `voyager voyager.py` import json @@ -8,20 +8,19 @@ import time from typing import Any, Iterable -from langchain.embeddings.openai import OpenAIEmbeddings -from langchain.vectorstores import Chroma +from llama_index.vector_stores.chroma import ChromaVectorStore from pydantic import ConfigDict, Field from metagpt.config2 import config as CONFIG from metagpt.environment.base_env import Environment -from metagpt.environment.mincraft_env.const import MC_CKPT_DIR -from metagpt.environment.mincraft_env.mincraft_ext_env import MincraftExtEnv +from metagpt.environment.minecraft.const import MC_CKPT_DIR +from metagpt.environment.minecraft.minecraft_ext_env import MinecraftExtEnv from metagpt.logs import logger from metagpt.utils.common import load_mc_skills_code, read_json_file, write_json_file -class MincraftEnv(Environment, MincraftExtEnv): - """MincraftEnv, including shared memory of cache and infomation between roles""" +class MinecraftEnv(MinecraftExtEnv, Environment): + """MinecraftEnv, including shared memory of cache and information between roles""" model_config = ConfigDict(arbitrary_types_allowed=True) @@ -48,9 +47,9 @@ class MincraftEnv(Environment, MincraftExtEnv): runtime_status: bool = False # equal to action execution status: success or failed - vectordb: Chroma = Field(default_factory=Chroma) + vectordb: ChromaVectorStore = Field(default_factory=ChromaVectorStore) - qa_cache_questions_vectordb: Chroma = Field(default_factory=Chroma) + qa_cache_questions_vectordb: ChromaVectorStore = Field(default_factory=ChromaVectorStore) @property def progress(self): @@ -73,16 +72,14 @@ def set_mc_port(self, mc_port): self.set_mc_resume() def set_mc_resume(self): - self.qa_cache_questions_vectordb = Chroma( + self.qa_cache_questions_vectordb = ChromaVectorStore( collection_name="qa_cache_questions_vectordb", - embedding_function=OpenAIEmbeddings(), - persist_directory=f"{MC_CKPT_DIR}/curriculum/vectordb", + persist_dir=f"{MC_CKPT_DIR}/curriculum/vectordb", ) - self.vectordb = Chroma( + self.vectordb = ChromaVectorStore( collection_name="skill_vectordb", - embedding_function=OpenAIEmbeddings(), - persist_directory=f"{MC_CKPT_DIR}/skill/vectordb", + persist_dir=f"{MC_CKPT_DIR}/skill/vectordb", ) if CONFIG.resume: @@ -285,7 +282,7 @@ def update_exploration_progress(self, success: bool): position = event["status"]["position"] blocks.append(block) positions.append(position) - new_events = self.step( + new_events = self._step( f"await givePlacedItemBack(bot, {json.dumps(blocks)}, {json.dumps(positions)})", programs=self.programs, ) @@ -326,7 +323,7 @@ async def on_event_retrieve(self, *args): Exception: If there is an issue retrieving events. """ try: - self.reset( + self._reset( options={ "mode": "soft", "wait_ticks": 20, @@ -335,13 +332,13 @@ async def on_event_retrieve(self, *args): # difficulty = "easy" if len(self.completed_tasks) > 15 else "peaceful" difficulty = "peaceful" - events = self.step("bot.chat(`/time set ${getNextTime()}`);\n" + f"bot.chat('/difficulty {difficulty}');") + events = self._step("bot.chat(`/time set ${getNextTime()}`);\n" + f"bot.chat('/difficulty {difficulty}');") self.update_event(events) return events except Exception as e: time.sleep(3) # wait for mineflayer to exit # reset bot status here - events = self.reset( + events = self._reset( options={ "mode": "hard", "wait_ticks": 20, @@ -368,7 +365,7 @@ async def on_event_execute(self, *args): Exception: If there is an issue retrieving events. """ try: - events = self.step( + events = self._step( code=self.code, programs=self.programs, ) @@ -377,7 +374,7 @@ async def on_event_execute(self, *args): except Exception as e: time.sleep(3) # wait for mineflayer to exit # reset bot status here - events = self.reset( + events = self._reset( options={ "mode": "hard", "wait_ticks": 20, diff --git a/metagpt/environment/mincraft_env/mincraft_ext_env.py b/metagpt/environment/minecraft/minecraft_ext_env.py similarity index 87% rename from metagpt/environment/mincraft_env/mincraft_ext_env.py rename to metagpt/environment/minecraft/minecraft_ext_env.py index b86250d8cd..0436bc3aa0 100644 --- a/metagpt/environment/mincraft_env/mincraft_ext_env.py +++ b/metagpt/environment/minecraft/minecraft_ext_env.py @@ -1,28 +1,29 @@ #!/usr/bin/env python # -*- coding: utf-8 -*- -# @Desc : The Mincraft external environment to integrate with Mincraft game +# @Desc : The Minecraft external environment to integrate with Minecraft game # refs to `voyager bridge.py` import json import time -from typing import Optional +from typing import Any, Optional import requests from pydantic import ConfigDict, Field, model_validator from metagpt.environment.base_env import ExtEnv, mark_as_writeable -from metagpt.environment.mincraft_env.const import ( +from metagpt.environment.base_env_space import BaseEnvAction, BaseEnvObsParams +from metagpt.environment.minecraft.const import ( MC_CKPT_DIR, MC_CORE_INVENTORY_ITEMS, MC_CURRICULUM_OB, MC_DEFAULT_WARMUP, METAGPT_ROOT, ) -from metagpt.environment.mincraft_env.process_monitor import SubprocessMonitor +from metagpt.environment.minecraft.process_monitor import SubprocessMonitor from metagpt.logs import logger -class MincraftExtEnv(ExtEnv): +class MinecraftExtEnv(ExtEnv): model_config = ConfigDict(arbitrary_types_allowed=True) mc_port: Optional[int] = Field(default=None) @@ -38,6 +39,20 @@ class MincraftExtEnv(ExtEnv): server_paused: bool = Field(default=False) warm_up: dict = Field(default=dict()) + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, Any], dict[str, Any]]: + pass + + def observe(self, obs_params: Optional[BaseEnvObsParams] = None) -> Any: + pass + + def step(self, action: BaseEnvAction) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]: + pass + @property def server(self) -> str: return f"{self.server_host}:{self.server_port}" @@ -48,7 +63,7 @@ def _post_init_ext_env(self): self.mineflayer = SubprocessMonitor( commands=[ "node", - METAGPT_ROOT.joinpath("metagpt", "environment", "mincraft_env", "mineflayer", "index.js"), + METAGPT_ROOT.joinpath("metagpt", "environment", "minecraft", "mineflayer", "index.js"), str(self.server_port), ], name="mineflayer", @@ -115,7 +130,7 @@ def check_process(self) -> dict: return res.json() @mark_as_writeable - def reset(self, *, seed=None, options=None) -> dict: + def _reset(self, *, seed=None, options=None) -> dict: if options is None: options = {} if options.get("inventory", {}) and options.get("mode", "hard") != "hard": @@ -145,7 +160,7 @@ def reset(self, *, seed=None, options=None) -> dict: return json.loads(returned_data) @mark_as_writeable - def step(self, code: str, programs: str = "") -> dict: + def _step(self, code: str, programs: str = "") -> dict: if not self.has_reset: raise RuntimeError("Environment has not been reset yet") self.check_process() diff --git a/metagpt/environment/mincraft_env/mineflayer/.gitignore b/metagpt/environment/minecraft/mineflayer/.gitignore similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/.gitignore rename to metagpt/environment/minecraft/mineflayer/.gitignore diff --git a/metagpt/environment/mincraft_env/mineflayer/.prettierignore b/metagpt/environment/minecraft/mineflayer/.prettierignore similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/.prettierignore rename to metagpt/environment/minecraft/mineflayer/.prettierignore diff --git a/metagpt/environment/mincraft_env/mineflayer/.prettierrc.json b/metagpt/environment/minecraft/mineflayer/.prettierrc.json similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/.prettierrc.json rename to metagpt/environment/minecraft/mineflayer/.prettierrc.json diff --git a/metagpt/environment/mincraft_env/mineflayer/index.js b/metagpt/environment/minecraft/mineflayer/index.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/index.js rename to metagpt/environment/minecraft/mineflayer/index.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/base.js b/metagpt/environment/minecraft/mineflayer/lib/observation/base.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/base.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/base.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/chests.js b/metagpt/environment/minecraft/mineflayer/lib/observation/chests.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/chests.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/chests.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/inventory.js b/metagpt/environment/minecraft/mineflayer/lib/observation/inventory.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/inventory.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/inventory.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/onChat.js b/metagpt/environment/minecraft/mineflayer/lib/observation/onChat.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/onChat.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/onChat.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/onError.js b/metagpt/environment/minecraft/mineflayer/lib/observation/onError.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/onError.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/onError.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/onSave.js b/metagpt/environment/minecraft/mineflayer/lib/observation/onSave.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/onSave.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/onSave.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/status.js b/metagpt/environment/minecraft/mineflayer/lib/observation/status.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/status.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/status.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/observation/voxels.js b/metagpt/environment/minecraft/mineflayer/lib/observation/voxels.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/observation/voxels.js rename to metagpt/environment/minecraft/mineflayer/lib/observation/voxels.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/skillLoader.js b/metagpt/environment/minecraft/mineflayer/lib/skillLoader.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/skillLoader.js rename to metagpt/environment/minecraft/mineflayer/lib/skillLoader.js diff --git a/metagpt/environment/mincraft_env/mineflayer/lib/utils.js b/metagpt/environment/minecraft/mineflayer/lib/utils.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/lib/utils.js rename to metagpt/environment/minecraft/mineflayer/lib/utils.js diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/.gitignore b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/.gitignore similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/.gitignore rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/.gitignore diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/LICENSE b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/LICENSE similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/LICENSE rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/LICENSE diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/README.md b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/README.md similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/README.md rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/README.md diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/_config.yml b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/_config.yml similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/_config.yml rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/_config.yml diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/docs/api.md b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/docs/api.md similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/docs/api.md rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/docs/api.md diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/collector.js b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/collector.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/collector.js rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/collector.js diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/oreMiner.js b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/oreMiner.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/oreMiner.js rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/oreMiner.js diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/storageBot.js b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/storageBot.js similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/examples/storageBot.js rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/examples/storageBot.js diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/package.json b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/package.json similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/package.json rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/package.json diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/BlockVeins.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/BlockVeins.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/BlockVeins.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/BlockVeins.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/CollectBlock.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/CollectBlock.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/CollectBlock.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/CollectBlock.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Inventory.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Inventory.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Inventory.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Inventory.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Targets.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Targets.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Targets.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Targets.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/TaskQueue.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/TaskQueue.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/TaskQueue.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/TaskQueue.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/TemporarySubscriber.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/TemporarySubscriber.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/TemporarySubscriber.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/TemporarySubscriber.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Util.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Util.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/Util.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/Util.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/index.ts b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/index.ts similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/src/index.ts rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/index.ts diff --git a/metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/tsconfig.json b/metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/tsconfig.json similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/mineflayer-collectblock/tsconfig.json rename to metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/tsconfig.json diff --git a/metagpt/environment/mincraft_env/mineflayer/package.json b/metagpt/environment/minecraft/mineflayer/package.json similarity index 100% rename from metagpt/environment/mincraft_env/mineflayer/package.json rename to metagpt/environment/minecraft/mineflayer/package.json diff --git a/metagpt/environment/mincraft_env/process_monitor.py b/metagpt/environment/minecraft/process_monitor.py similarity index 100% rename from metagpt/environment/mincraft_env/process_monitor.py rename to metagpt/environment/minecraft/process_monitor.py diff --git a/metagpt/environment/stanford_town_env/__init__.py b/metagpt/environment/software/__init__.py similarity index 100% rename from metagpt/environment/stanford_town_env/__init__.py rename to metagpt/environment/software/__init__.py diff --git a/metagpt/environment/software_env/software_env.py b/metagpt/environment/software/software_env.py similarity index 100% rename from metagpt/environment/software_env/software_env.py rename to metagpt/environment/software/software_env.py diff --git a/metagpt/environment/werewolf_env/__init__.py b/metagpt/environment/stanford_town/__init__.py similarity index 100% rename from metagpt/environment/werewolf_env/__init__.py rename to metagpt/environment/stanford_town/__init__.py diff --git a/metagpt/environment/stanford_town/env_space.py b/metagpt/environment/stanford_town/env_space.py new file mode 100644 index 0000000000..e100a29527 --- /dev/null +++ b/metagpt/environment/stanford_town/env_space.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from typing import Any, Optional, Union + +import numpy as np +import numpy.typing as npt +from gymnasium import spaces +from pydantic import ConfigDict, Field, field_validator + +from metagpt.environment.base_env_space import ( + BaseEnvAction, + BaseEnvActionType, + BaseEnvObsParams, + BaseEnvObsType, +) + + +class EnvActionType(BaseEnvActionType): + NONE = 0 # no action to run, just get observation + + ADD_TILE_EVENT = 1 # Add an event triple to a tile + RM_TILE_EVENT = 2 # Remove an event triple from a tile + TURN_TILE_EVENT_IDLE = 3 # Turn an event triple from a tile into idle + RM_TITLE_SUB_EVENT = 4 # Remove an event triple that has the input subject from a tile + + +class EnvAction(BaseEnvAction): + """env action type and its related params of action functions/apis""" + + model_config = ConfigDict(arbitrary_types_allowed=True) + + action_type: int = Field(default=EnvActionType.NONE, description="action type") + coord: npt.NDArray[np.int64] = Field( + default_factory=lambda: np.zeros(2, dtype=np.int64), description="tile coordinate" + ) + subject: str = Field(default="", description="subject name of first element in event") + event: tuple[str, Optional[str], Optional[str], Optional[str]] = Field( + default=["", None, None, None], description="tile event" + ) + + @field_validator("coord", mode="before") + @classmethod + def check_coord(cls, coord) -> npt.NDArray[np.int64]: + if not isinstance(coord, np.ndarray): + return np.array(coord) + + +class EnvObsType(BaseEnvObsType): + """get part observation with specific params""" + + NONE = 0 # get whole observation from env + + GET_TITLE = 1 # get the tile detail dictionary with given tile coord + TILE_PATH = 2 # get the tile address with given tile coord + TILE_NBR = 3 # get the neighbors of given tile coord and its vision radius + + +class EnvObsParams(BaseEnvObsParams): + """observation params for different EnvObsType""" + + model_config = ConfigDict(arbitrary_types_allowed=True) + + obs_type: int = Field(default=EnvObsType.NONE, description="observation type") + coord: npt.NDArray[np.int64] = Field( + default_factory=lambda: np.zeros(2, dtype=np.int64), description="tile coordinate" + ) + level: str = Field(default="", description="different level of title") + vision_radius: int = Field(default=0, description="the vision radius of current tile") + + @field_validator("coord", mode="before") + @classmethod + def check_coord(cls, coord) -> npt.NDArray[np.int64]: + if not isinstance(coord, np.ndarray): + return np.array(coord) + + +EnvObsValType = Union[list[list[str]], dict[str, set[tuple[int, int]]], list[list[dict[str, Any]]]] + + +def get_observation_space() -> spaces.Dict: + # it's a + space = spaces.Dict( + {"collision_maze": spaces.Discrete(2), "tiles": spaces.Discrete(2), "address_tiles": spaces.Discrete(2)} + ) + + return space + + +def get_action_space(maze_shape: tuple[int, int]) -> spaces.Dict: + """The fields defined by the space correspond to the input parameters of the action except `action_type`""" + space = spaces.Dict( + { + "action_type": spaces.Discrete(len(EnvActionType)), + "coord": spaces.Box( + np.array([0, 0], dtype=np.int64), np.array([maze_shape[0], maze_shape[1]], dtype=np.int64) + ), # coord of the tile + "subject": spaces.Text(256), # the first element of an tile event + "event": spaces.Tuple( + (spaces.Text(256), spaces.Text(256), spaces.Text(256), spaces.Text(256)) + ), # event is a tuple of four str + } + ) + return space diff --git a/metagpt/environment/stanford_town/stanford_town_env.py b/metagpt/environment/stanford_town/stanford_town_env.py new file mode 100644 index 0000000000..af8a882b2d --- /dev/null +++ b/metagpt/environment/stanford_town/stanford_town_env.py @@ -0,0 +1,10 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : MG StanfordTown Env + +from metagpt.environment.base_env import Environment +from metagpt.environment.stanford_town.stanford_town_ext_env import StanfordTownExtEnv + + +class StanfordTownEnv(StanfordTownExtEnv, Environment): + pass diff --git a/metagpt/environment/stanford_town_env/stanford_town_ext_env.py b/metagpt/environment/stanford_town/stanford_town_ext_env.py similarity index 83% rename from metagpt/environment/stanford_town_env/stanford_town_ext_env.py rename to metagpt/environment/stanford_town/stanford_town_ext_env.py index 8a9a659659..30a02d4dbe 100644 --- a/metagpt/environment/stanford_town_env/stanford_town_ext_env.py +++ b/metagpt/environment/stanford_town/stanford_town_ext_env.py @@ -5,11 +5,20 @@ import math from pathlib import Path -from typing import Optional, Tuple +from typing import Any, Optional from pydantic import ConfigDict, Field, model_validator from metagpt.environment.base_env import ExtEnv, mark_as_readable, mark_as_writeable +from metagpt.environment.stanford_town.env_space import ( + EnvAction, + EnvActionType, + EnvObsParams, + EnvObsType, + EnvObsValType, + get_action_space, + get_observation_space, +) from metagpt.utils.common import read_csv_to_list, read_json_file @@ -197,15 +206,82 @@ def _init_maze(cls, values): else: address_tiles[add] = set([(j, i)]) values["address_tiles"] = address_tiles + + values["action_space"] = get_action_space((maze_width, maze_height)) + values["observation_space"] = get_observation_space() return values + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, EnvObsValType], dict[str, Any]]: + """reset env and get the init observation + Return results corresponding to `observation, info` + """ + super().reset(seed=seed, options=options) + + obs = self._get_obs() + + return obs, {} + + def _get_obs(self) -> dict[str, EnvObsValType]: + """Get observation""" + return { + "collision_maze": self.get_collision_maze(), + "tiles": self.tiles, + "address_tiles": self.get_address_tiles(), + } + + def observe(self, obs_params: Optional[EnvObsParams] = None) -> Any: + """Get partial or full observation from the env""" + obs_type = obs_params.obs_type if obs_params else EnvObsType.NONE + if obs_type == EnvObsType.NONE: + obs = self._get_obs() + elif obs_type == EnvObsType.GET_TITLE: + obs = self.access_tile(tile=obs_params.coord) + elif obs_type == EnvObsType.TILE_PATH: + obs = self.get_tile_path(tile=obs_params.coord, level=obs_params.level) + elif obs_type == EnvObsType.TILE_NBR: + obs = self.get_nearby_tiles(tile=obs_params.coord, vision_r=obs_params.vision_radius) + return obs + + def step(self, action: EnvAction) -> tuple[dict[str, EnvObsValType], float, bool, bool, dict[str, Any]]: + """Execute action and then return observation + Return results corresponding to `observation, reward, terminated, truncated, info` + """ + terminated = False + try: + self._execute_env_action(action) + except Exception: + terminated = True + + obs = self._get_obs() + + ret = (obs, 1.0, terminated, False, {}) + return ret + + def _execute_env_action(self, action: EnvAction): + action_type = action.action_type + if action_type == EnvActionType.NONE: + pass + elif action_type == EnvActionType.ADD_TILE_EVENT: + self.add_event_from_tile(curr_event=action.event, tile=action.coord) + elif action_type == EnvActionType.RM_TILE_EVENT: + self.remove_event_from_tile(curr_event=action.event, tile=action.coord) + elif action_type == EnvActionType.TURN_TILE_EVENT_IDLE: + self.turn_event_from_tile_idle(curr_event=action.event, tile=action.coord) + elif action_type == EnvActionType.RM_TITLE_SUB_EVENT: + self.remove_subject_events_from_tile(subject=action.subject, tile=action.coord) + def turn_coordinate_to_tile(self, px_coordinate: tuple[int, int]) -> tuple[int, int]: """ Turns a pixel coordinate to a tile coordinate. """ x = math.ceil(px_coordinate[0] / self.sq_tile_size) y = math.ceil(px_coordinate[1] / self.sq_tile_size) - return (x, y) + return x, y @mark_as_readable def get_collision_maze(self) -> list: @@ -316,10 +392,6 @@ def get_nearby_tiles(self, tile: tuple[int, int], vision_r: int) -> list[tuple[i nearby_tiles += [(i, j)] return nearby_tiles - @mark_as_writeable - def add_tiles_event(self, pt_y: int, pt_x: int, event: Tuple[str, str, str, str]): - self.tiles[pt_y][pt_x]["events"].add(event) - @mark_as_writeable def add_event_from_tile(self, curr_event: tuple[str], tile: tuple[int, int]) -> None: """ diff --git a/metagpt/environment/stanford_town_env/stanford_town_env.py b/metagpt/environment/stanford_town_env/stanford_town_env.py deleted file mode 100644 index 8721d6cd18..0000000000 --- a/metagpt/environment/stanford_town_env/stanford_town_env.py +++ /dev/null @@ -1,12 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# @Desc : MG StanfordTown Env - -from metagpt.environment.base_env import Environment -from metagpt.environment.stanford_town_env.stanford_town_ext_env import ( - StanfordTownExtEnv, -) - - -class StanfordTownEnv(Environment, StanfordTownExtEnv): - pass diff --git a/tests/metagpt/environment/mincraft_env/__init__.py b/metagpt/environment/werewolf/__init__.py similarity index 100% rename from tests/metagpt/environment/mincraft_env/__init__.py rename to metagpt/environment/werewolf/__init__.py diff --git a/metagpt/environment/werewolf/const.py b/metagpt/environment/werewolf/const.py new file mode 100644 index 0000000000..7f810389da --- /dev/null +++ b/metagpt/environment/werewolf/const.py @@ -0,0 +1,121 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from enum import Enum + +from metagpt.const import MESSAGE_ROUTE_TO_ALL + + +class RoleType(Enum): + VILLAGER = "Villager" + WEREWOLF = "Werewolf" + GUARD = "Guard" + SEER = "Seer" + WITCH = "Witch" + MODERATOR = "Moderator" + + +class RoleState(Enum): + ALIVE = "alive" # the role is alive + DEAD = "dead" # killed or poisoned + KILLED = "killed" # killed by werewolf or voting + POISONED = "poisoned" # killed by poison + SAVED = "saved" # saved by antidote + PROTECTED = "projected" # projected by guard + + +class RoleActionRes(Enum): + SAVE = "save" + PASS = "pass" # ignore current action output + + +empty_set = set() + +# the ordered rules by the moderator to announce to everyone each step +STEP_INSTRUCTIONS = { + 0: { + "content": "It’s dark, everyone close your eyes. I will talk with you/your team secretly at night.", + "send_to": {RoleType.MODERATOR.value}, # for moderator to continue speaking + "restricted_to": empty_set, + }, + 1: { + "content": "Guard, please open your eyes!", + "send_to": {RoleType.MODERATOR.value}, # for moderator to continue speaking + "restricted_to": empty_set, + }, + 2: { + "content": """Guard, now tell me who you protect tonight? +You only choose one from the following living options please: {living_players}. +Or you can pass. For example: Protect ...""", + "send_to": {RoleType.GUARD.value}, + "restricted_to": {RoleType.MODERATOR.value, RoleType.GUARD.value}, + }, + 3: {"content": "Guard, close your eyes", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + 4: { + "content": "Werewolves, please open your eyes!", + "send_to": {RoleType.MODERATOR.value}, + "restricted_to": empty_set, + }, + 5: { + "content": """Werewolves, I secretly tell you that {werewolf_players} are +all of the {werewolf_num} werewolves! Keep in mind you are teammates. The rest players are not werewolves. +choose one from the following living options please: +{living_players}. For example: Kill ...""", + "send_to": {RoleType.WEREWOLF.value}, + "restricted_to": {RoleType.MODERATOR.value, RoleType.WEREWOLF.value}, + }, + 6: {"content": "Werewolves, close your eyes", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + 7: {"content": "Witch, please open your eyes!", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + 8: { + "content": """Witch, tonight {player_hunted} has been killed by the werewolves. +You have a bottle of antidote, would you like to save him/her? If so, say "Save", else, say "Pass".""", + "send_to": {RoleType.WITCH.value}, + "restricted_to": {RoleType.MODERATOR.value, RoleType.WITCH.value}, + }, # 要先判断女巫是否有解药,再去询问女巫是否使用解药救人 + 9: { + "content": """Witch, you also have a bottle of poison, would you like to use it to kill one of the living players? +Choose one from the following living options: {living_players}. +If so, say ONLY "Poison PlayerX", replace PlayerX with the actual player name, else, say "Pass".""", + "send_to": {RoleType.WITCH.value}, + "restricted_to": {RoleType.MODERATOR.value, RoleType.WITCH.value}, + }, # + 10: {"content": "Witch, close your eyes", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + 11: {"content": "Seer, please open your eyes!", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + 12: { + "content": """Seer, you can check one player's identity. Who are you going to verify its identity tonight? +Choose only one from the following living options:{living_players}.""", + "send_to": {RoleType.SEER.value}, + "restricted_to": {RoleType.MODERATOR.value, RoleType.SEER.value}, + }, + 13: {"content": "Seer, close your eyes", "send_to": {RoleType.MODERATOR.value}, "restricted_to": empty_set}, + # The 1-st daytime + 14: { + "content": """It's daytime. Everyone woke up except those who had been killed.""", + "send_to": {RoleType.MODERATOR.value}, + "restricted_to": empty_set, + }, + 15: { + "content": "{player_current_dead} was killed last night!", + "send_to": {RoleType.MODERATOR.value}, + "restricted_to": empty_set, + }, + 16: { + "content": """Living players: {living_players}, now freely talk about the current situation based on your observation and +reflection with a few sentences. Decide whether to reveal your identity based on your reflection.""", + "send_to": {MESSAGE_ROUTE_TO_ALL}, # send to all to speak in daytime + "restricted_to": empty_set, + }, + 17: { + "content": """Now vote and tell me who you think is the werewolf. Don’t mention your role. +You only choose one from the following living options please: +{living_players}. Say ONLY: I vote to eliminate ...""", + "send_to": {MESSAGE_ROUTE_TO_ALL}, + "restricted_to": empty_set, + }, + 18: { + "content": """{player_current_dead} was eliminated.""", + "send_to": {RoleType.MODERATOR.value}, + "restricted_to": empty_set, + }, +} diff --git a/metagpt/environment/werewolf/env_space.py b/metagpt/environment/werewolf/env_space.py new file mode 100644 index 0000000000..30a0fa01d3 --- /dev/null +++ b/metagpt/environment/werewolf/env_space.py @@ -0,0 +1,60 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : werewolf observation/action space and its action definition + +from gymnasium import spaces +from pydantic import ConfigDict, Field + +from metagpt.environment.base_env_space import BaseEnvAction, BaseEnvActionType +from metagpt.environment.werewolf.const import STEP_INSTRUCTIONS + + +class EnvActionType(BaseEnvActionType): + NONE = 0 # no action to run, just get observation + WOLF_KILL = 1 # wolf kill someone + VOTE_KILL = 2 # vote kill someone + WITCH_POISON = 3 # witch poison someone + WITCH_SAVE = 4 # witch save someone + GUARD_PROTECT = 5 # guard protect someone + PROGRESS_STEP = 6 # step increment + + +class EnvAction(BaseEnvAction): + model_config = ConfigDict(arbitrary_types_allowed=True) + + action_type: int = Field(default=EnvActionType.NONE, description="action type") + player_name: str = Field(default="", description="the name of the player to do the action") + target_player_name: str = Field(default="", description="the name of the player who take the action") + + +def get_observation_space() -> spaces.Dict: + space = spaces.Dict( + { + "game_setup": spaces.Text(256), + "step_idx": spaces.Discrete(len(STEP_INSTRUCTIONS)), + "living_players": spaces.Tuple( + (spaces.Text(16), spaces.Text(16)) + ), # TODO should be tuple of variable length + "werewolf_players": spaces.Tuple( + (spaces.Text(16), spaces.Text(16)) + ), # TODO should be tuple of variable length + "player_hunted": spaces.Text(16), + "player_current_dead": spaces.Tuple((spaces.Text(16))), # TODO should be tuple of variable length + "witch_poison_left": spaces.Discrete(2), + "witch_antidote_left": spaces.Discrete(2), + "winner": spaces.Text(16), + "win_reason": spaces.Text(64), + } + ) + return space + + +def get_action_space() -> spaces.Dict: + space = spaces.Dict( + { + "action_type": spaces.Discrete(len(EnvActionType)), + "player_name": spaces.Text(16), # the player to do the action + "target_player_name": spaces.Text(16), # the target player who take the action + } + ) + return space diff --git a/metagpt/environment/werewolf/werewolf_env.py b/metagpt/environment/werewolf/werewolf_env.py new file mode 100644 index 0000000000..999ff63a1c --- /dev/null +++ b/metagpt/environment/werewolf/werewolf_env.py @@ -0,0 +1,41 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : MG Werewolf Env + +from typing import Iterable + +from pydantic import Field + +from metagpt.environment.base_env import Environment +from metagpt.environment.werewolf.werewolf_ext_env import WerewolfExtEnv +from metagpt.schema import Message + + +class WerewolfEnv(WerewolfExtEnv, Environment): + round_cnt: int = Field(default=0) + + def add_roles(self, roles: Iterable["Role"]): + """增加一批在当前环境的角色 + Add a batch of characters in the current environment + """ + for role in roles: + self.roles[role.name] = role # use name as key here, due to multi-player can have same profile + + for role in roles: # setup system message with roles + role.context = self.context + role.set_env(self) + + def publish_message(self, message: Message, add_timestamp: bool = True): + """Post information to the current environment""" + if add_timestamp: + # Because the content of the message may be repeated, for example, killing the same person in two nights + # Therefore, a unique round_cnt prefix needs to be added so that the same message will not be automatically deduplicated when added to the memory. + message.content = f"{self.round_cnt} | " + message.content + super().publish_message(message) + + async def run(self, k=1): + """Process all Role runs by order""" + for _ in range(k): + for role in self.roles.values(): + await role.run() + self.round_cnt += 1 diff --git a/metagpt/environment/werewolf_env/werewolf_ext_env.py b/metagpt/environment/werewolf/werewolf_ext_env.py similarity index 54% rename from metagpt/environment/werewolf_env/werewolf_ext_env.py rename to metagpt/environment/werewolf/werewolf_ext_env.py index 7c4b4c475a..a8636536b1 100644 --- a/metagpt/environment/werewolf_env/werewolf_ext_env.py +++ b/metagpt/environment/werewolf/werewolf_ext_env.py @@ -4,109 +4,27 @@ import random from collections import Counter -from enum import Enum -from typing import Callable, Optional +from typing import Any, Callable, Optional from pydantic import ConfigDict, Field from metagpt.environment.base_env import ExtEnv, mark_as_readable, mark_as_writeable +from metagpt.environment.base_env_space import BaseEnvObsParams +from metagpt.environment.werewolf.const import STEP_INSTRUCTIONS, RoleState, RoleType +from metagpt.environment.werewolf.env_space import EnvAction, EnvActionType from metagpt.logs import logger -class RoleState(Enum): - ALIVE = "alive" # the role is alive - KILLED = "killed" # the role is killed by werewolf or voting - POISONED = "poisoned" # the role is killed by posion - SAVED = "saved" # the role is saved by antidote - - -# the ordered rules by the moderator to announce to everyone each step -STEP_INSTRUCTIONS = { - 0: { - "content": "It’s dark, everyone close your eyes. I will talk with you/your team secretly at night.", - "send_to": "Moderator", # for moderator to continuen speaking - "restricted_to": "", - }, - 1: { - "content": "Guard, please open your eyes!", - "send_to": "Moderator", # for moderator to continuen speaking - "restricted_to": "", - }, - 2: { - "content": """Guard, now tell me who you protect tonight? - You only choose one from the following living options please: {living_players}. - Or you can pass. For example: Protect ...""", - "send_to": "Guard", - "restricted_to": "Moderator,Guard", - }, - 3: {"content": "Guard, close your eyes", "send_to": "Moderator", "restricted_to": ""}, - 4: {"content": "Werewolves, please open your eyes!", "send_to": "Moderator", "restricted_to": ""}, - 5: { - "content": """Werewolves, I secretly tell you that {werewolf_players} are - all of the 2 werewolves! Keep in mind you are teammates. The rest players are not werewolves. - choose one from the following living options please: - {living_players}. For example: Kill ...""", - "send_to": "Werewolf", - "restricted_to": "Moderator,Werewolf", - }, - 6: {"content": "Werewolves, close your eyes", "send_to": "Moderator", "restricted_to": ""}, - 7: {"content": "Witch, please open your eyes!", "send_to": "Moderator", "restricted_to": ""}, - 8: { - "content": """Witch, tonight {player_hunted} has been killed by the werewolves. - You have a bottle of antidote, would you like to save him/her? If so, say "Save", else, say "Pass".""", - "send_to": "Witch", - "restricted_to": "Moderator,Witch", - }, # 要先判断女巫是否有解药,再去询问女巫是否使用解药救人 - 9: { - "content": """Witch, you also have a bottle of poison, would you like to use it to kill one of the living players? - Choose one from the following living options: {living_players}. - If so, say ONLY "Poison PlayerX", replace PlayerX with the actual player name, else, say "Pass".""", - "send_to": "Witch", - "restricted_to": "Moderator,Witch", - }, # - 10: {"content": "Witch, close your eyes", "send_to": "Moderator", "restricted_to": ""}, - 11: {"content": "Seer, please open your eyes!", "send_to": "Moderator", "restricted_to": ""}, - 12: { - "content": """Seer, you can check one player's identity. Who are you going to verify its identity tonight? - Choose only one from the following living options:{living_players}.""", - "send_to": "Seer", - "restricted_to": "Moderator,Seer", - }, - 13: {"content": "Seer, close your eyes", "send_to": "Moderator", "restricted_to": ""}, - # The 1-st daytime - 14: { - "content": """It's daytime. Everyone woke up except those who had been killed.""", - "send_to": "Moderator", - "restricted_to": "", - }, - 15: {"content": "{player_current_dead} was killed last night!", "send_to": "Moderator", "restricted_to": ""}, - 16: { - "content": """Living players: {living_players}, now freely talk about the current situation based on your observation and - reflection with a few sentences. Decide whether to reveal your identity based on your reflection.""", - "send_to": "", # send to all to speak in daytime - "restricted_to": "", - }, - 17: { - "content": """Now vote and tell me who you think is the werewolf. Don’t mention your role. - You only choose one from the following living options please: - {living_players}. Say ONLY: I vote to eliminate ...""", - "send_to": "", - "restricted_to": "", - }, - 18: {"content": """{player_current_dead} was eliminated.""", "send_to": "Moderator", "restricted_to": ""}, -} - - class WerewolfExtEnv(ExtEnv): model_config = ConfigDict(arbitrary_types_allowed=True) players_state: dict[str, tuple[str, RoleState]] = Field( - default=dict(), description="the player's role type and state by player_name" + default_factory=dict, description="the player's role type and state by player_name" ) round_idx: int = Field(default=0) # the current round step_idx: int = Field(default=0) # the current step of current round - eval_step_idx: int = Field(default=0) + eval_step_idx: list[int] = Field(default=[]) per_round_steps: int = Field(default=len(STEP_INSTRUCTIONS)) # game global states @@ -114,13 +32,13 @@ class WerewolfExtEnv(ExtEnv): special_role_players: list[str] = Field(default=[]) winner: Optional[str] = Field(default=None) win_reason: Optional[str] = Field(default=None) - witch_poison_left: int = Field(default=1) - witch_antidote_left: int = Field(default=1) + witch_poison_left: int = Field(default=1, description="should be 1 or 0") + witch_antidote_left: int = Field(default=1, description="should be 1 or 0") # game current round states, a round is from closing your eyes to the next time you close your eyes - round_hunts: dict[str, str] = Field(default=dict(), description="nighttime wolf hunt result") + round_hunts: dict[str, str] = Field(default_factory=dict, description="nighttime wolf hunt result") round_votes: dict[str, str] = Field( - default=dict(), description="daytime all players vote result, key=voteer, value=voted one" + default_factory=dict, description="daytime all players vote result, key=voter, value=voted one" ) player_hunted: Optional[str] = Field(default=None) player_protected: Optional[str] = Field(default=None) @@ -128,6 +46,76 @@ class WerewolfExtEnv(ExtEnv): player_poisoned: Optional[str] = Field(default=None) player_current_dead: list[str] = Field(default=[]) + def reset( + self, + *, + seed: Optional[int] = None, + options: Optional[dict[str, Any]] = None, + ) -> tuple[dict[str, Any], dict[str, Any]]: + """currently unused""" + pass + + def observe(self, obs_params: Optional[BaseEnvObsParams] = None) -> Any: + """currently unused""" + pass + + def _get_obs(self): + return { + "game_setup": self.game_setup, + "step_idx": self.step_idx, + "living_players": self.living_players, + "werewolf_players": self.werewolf_players, # currently, lack observation isolation + "player_hunted": self.player_hunted, + "player_current_dead": self.player_current_dead, + "witch_poison_left": self.witch_poison_left, + "witch_antidote_left": self.witch_antidote_left, + "winner": self.winner, + "win_reason": self.win_reason, + } + + def step(self, action: EnvAction) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]: + action_type = action.action_type + player_name = action.player_name + target_player_name = action.target_player_name + if action_type == EnvActionType.WOLF_KILL: + self.wolf_kill_someone(wolf_name=player_name, player_name=target_player_name) + elif action_type == EnvActionType.VOTE_KILL: + self.vote_kill_someone(voter_name=player_name, player_name=target_player_name) + elif action_type == EnvActionType.WITCH_POISON: + self.witch_poison_someone(witch_name=player_name, player_name=target_player_name) + elif action_type == EnvActionType.WITCH_SAVE: + self.witch_save_someone(witch_name=player_name, player_name=target_player_name) + elif action_type == EnvActionType.GUARD_PROTECT: + self.guard_protect_someone(guard_name=player_name, player_name=target_player_name) + elif action_type == EnvActionType.PROGRESS_STEP: + self.progress_step() + elif action_type == EnvActionType.NONE: + pass + else: + raise ValueError(f"not supported action_type: {action_type}") + + self.update_game_states() + terminated = self._check_game_finish() + obs = self._get_obs() + return obs, 1.0, terminated, False, {} + + def _check_game_finish(self) -> bool: + """return True if game finished else False""" + # game's termination condition + terminated = False + living_werewolf = [p for p in self.werewolf_players if p in self.living_players] + living_villagers = [p for p in self.villager_players if p in self.living_players] + living_special_roles = [p for p in self.special_role_players if p in self.living_players] + if not living_werewolf: + self.winner = "good guys" + self.win_reason = "werewolves all dead" + terminated = True + elif not living_villagers or not living_special_roles: + self.winner = "werewolf" + self.win_reason = "villagers all dead" if not living_villagers else "special roles all dead" + terminated = True + return terminated + @property def living_players(self) -> list[str]: player_names = [] @@ -146,12 +134,12 @@ def _role_type_players(self, role_type: str) -> list[str]: @property def werewolf_players(self) -> list[str]: - player_names = self._role_type_players(role_type="Werewolf") + player_names = self._role_type_players(role_type=RoleType.WEREWOLF.value) return player_names @property def villager_players(self) -> list[str]: - player_names = self._role_type_players(role_type="Villager") + player_names = self._role_type_players(role_type=RoleType.VILLAGER.value) return player_names def _init_players_state(self, players: list["Role"]): @@ -178,14 +166,14 @@ def init_game_setup( """init players using different roles' num""" role_objs = [] for role_obj in role_uniq_objs: - if str(role_obj) == "Villager": + if RoleType.VILLAGER.value in str(role_obj): role_objs.extend([role_obj] * num_villager) - elif str(role_obj) == "Werewolf": + elif RoleType.WEREWOLF.value in str(role_obj): role_objs.extend([role_obj] * num_werewolf) else: role_objs.append(role_obj) if shuffle: - random.shuffle(len(role_objs)) + random.shuffle(role_objs) if add_human: assigned_role_idx = random.randint(0, len(role_objs) - 1) assigned_role = role_objs[assigned_role_idx] @@ -218,10 +206,12 @@ def _update_players_state(self, player_names: list[str], state: RoleState = Role roletype_state = self.players_state[player_name] self.players_state[player_name] = (roletype_state[0], state) - def _check_valid_role(self, player: "Role", role_type: str) -> bool: - return True if role_type in str(player) else False + def _check_valid_role(self, player_name: str, role_type: str) -> bool: + roletype_state = self.players_state.get(player_name) + return True if roletype_state and role_type in roletype_state[0] else False def _check_player_continue(self, player_name: str, particular_step: int = -1) -> bool: + """to check if can do the operation to the player""" step_idx = self.step_idx % self.per_round_steps if particular_step > 0 and step_idx != particular_step: # step no # particular_step = 18, not daytime vote time, ignore @@ -238,6 +228,10 @@ def curr_step_instruction(self) -> dict: self.step_idx += 1 return instruction + @mark_as_writeable + def progress_step(self): + self.step_idx += 1 + @mark_as_readable def get_players_state(self, player_names: list[str]) -> dict[str, RoleState]: players_state = { @@ -248,57 +242,72 @@ def get_players_state(self, player_names: list[str]) -> dict[str, RoleState]: return players_state @mark_as_writeable - def vote_kill_someone(self, voteer: "Role", player_name: str = None): + def vote_kill_someone(self, voter_name: str, player_name: str = None): """player vote result at daytime player_name: if it's None, regard as abstaining from voting """ - if not self._check_player_continue(voteer.name, particular_step=18): # 18=step no + if not self._check_player_continue(voter_name, particular_step=18): # 18=step no return - self.round_votes[voteer.name] = player_name + self.round_votes[voter_name] = player_name # check if all living players finish voting, then get the dead one if list(self.round_votes.keys()) == self.living_players: voted_all = list(self.round_votes.values()) # TODO in case of tie vote, check who was voted first voted_all = [item for item in voted_all if item] - self.player_current_dead = Counter(voted_all).most_common()[0][0] - self._update_players_state([self.player_current_dead]) + self.player_current_dead = [Counter(voted_all).most_common()[0][0]] + self._update_players_state(self.player_current_dead) @mark_as_writeable - def wolf_kill_someone(self, wolf: "Role", player_name: str): - if not self._check_valid_role(wolf, "Werewolf"): + def wolf_kill_someone(self, wolf_name: str, player_name: str): + if not self._check_valid_role(wolf_name, RoleType.WEREWOLF.value): return - if not self._check_player_continue(wolf.name, particular_step=5): # 5=step no + if not self._check_player_continue(wolf_name, particular_step=6): # 5=step no return - self.round_hunts[wolf.name] = player_name - living_werewolf = [p for p in self.werewolf_players if p in self.living_players] + self.round_hunts[wolf_name] = player_name + # living_werewolf = [p for p in self.werewolf_players if p in self.living_players] # check if all living wolfs finish hunting, then get the hunted one - if list(self.round_hunts.keys()) == living_werewolf: - hunted_all = list(self.round_hunts.values()) - self.player_hunted = Counter(hunted_all).most_common()[0][0] - - @mark_as_writeable - def witch_poison_someone(self, witch: "Role", player_name: str = None): - if not self._check_valid_role(witch, "Witch"): + # if list(self.round_hunts.keys()) == living_werewolf: + # hunted_all = list(self.round_hunts.values()) + # self.player_hunted = Counter(hunted_all).most_common()[0][0] + self.player_hunted = player_name + + def _witch_poison_or_save_someone( + self, witch_name: str, player_name: str = None, state: RoleState = RoleState.POISONED + ): + if not self._check_valid_role(witch_name, RoleType.WITCH.value): return if not self._check_player_continue(player_name): return - self._update_players_state([player_name], RoleState.POISONED) - self.player_poisoned = player_name + assert state in [RoleState.POISONED, RoleState.SAVED] + self._update_players_state([player_name], state) + if state == RoleState.POISONED: + self.player_poisoned = player_name + self.witch_poison_left -= 1 + else: + # self.player_protected = player_name + self.is_hunted_player_saved = True + self.witch_antidote_left -= 1 @mark_as_writeable - def witch_save_someone(self, witch: "Role", player_name: str = None): - if not self._check_valid_role(witch, "Witch"): + def witch_poison_someone(self, witch_name: str, player_name: str = None): + self._witch_poison_or_save_someone(witch_name, player_name, RoleState.POISONED) + + @mark_as_writeable + def witch_save_someone(self, witch_name: str, player_name: str = None): + self._witch_poison_or_save_someone(witch_name, player_name, RoleState.SAVED) + + @mark_as_writeable + def guard_protect_someone(self, guard_name: str, player_name: str = None): + if not self._check_valid_role(guard_name, RoleType.GUARD.value): return if not self._check_player_continue(player_name): return - - self._update_players_state([player_name], RoleState.SAVED) self.player_protected = player_name @mark_as_writeable - def update_game_states(self, memories: list): + def update_game_states(self): step_idx = self.step_idx % self.per_round_steps if step_idx not in [15, 18] or self.step_idx in self.eval_step_idx: return @@ -314,22 +323,12 @@ def update_game_states(self, memories: list): if self.player_poisoned: self.player_current_dead.append(self.player_poisoned) - self._update_players_state([self.player_current_dead]) + self._update_players_state(self.player_current_dead) # reset self.player_hunted = None self.player_protected = None self.is_hunted_player_saved = False self.player_poisoned = None - - # game's termination condition - living_werewolf = [p for p in self.werewolf_players if p in self.living_players] - living_villagers = [p for p in self.villager_players if p in self.living_players] - living_special_roles = [p for p in self.special_role_players if p in self.living_players] - if not living_werewolf: - self.winner = "good guys" - self.win_reason = "werewolves all dead" - elif not living_villagers or not living_special_roles: - self.winner = "werewolf" - self.win_reason = "villagers all dead" if not living_villagers else "special roles all dead" - if self.winner is not None: - self._record_all_experiences() # TODO + elif step_idx == 18: + # updated use vote_kill_someone + pass diff --git a/metagpt/environment/werewolf_env/werewolf_env.py b/metagpt/environment/werewolf_env/werewolf_env.py deleted file mode 100644 index d174f322ca..0000000000 --- a/metagpt/environment/werewolf_env/werewolf_env.py +++ /dev/null @@ -1,31 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# @Desc : MG Werewolf Env - -from pydantic import Field - -from metagpt.environment.base_env import Environment -from metagpt.environment.werewolf_env.werewolf_ext_env import WerewolfExtEnv -from metagpt.logs import logger -from metagpt.schema import Message - - -class WerewolfEnv(Environment, WerewolfExtEnv): - timestamp: int = Field(default=0) - - def publish_message(self, message: Message, add_timestamp: bool = True): - """Post information to the current environment""" - logger.debug(f"publish_message: {message.dump()}") - if add_timestamp: - # Because the content of the message may be repeated, for example, killing the same person in two nights - # Therefore, a unique timestamp prefix needs to be added so that the same message will not be automatically deduplicated when added to the memory. - message.content = f"{self.timestamp} | " + message.content - self.memory.add(message) - self.history += f"\n{message}" - - async def run(self, k=1): - """Process all Role runs by order""" - for _ in range(k): - for role in self.roles.values(): - await role.run() - self.timestamp += 1 diff --git a/metagpt/ext/__init__.py b/metagpt/ext/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/README.md b/metagpt/ext/android_assistant/README.md new file mode 100644 index 0000000000..fe8b4b3e32 --- /dev/null +++ b/metagpt/ext/android_assistant/README.md @@ -0,0 +1,118 @@ +# MetaGPT Android Assistant + +The MetaGPT Android Assistant is an intelligent assistance tool driven by a multi-modal large language model based on the advanced MetaGPT framework. It has the ability to self-learn, mastering users' daily usage patterns through learning, and can automatically complete various application operations according to user instructions, achieving comprehensive liberation of users' hands. +Next, we will introduce the functions of the MetaGPT Android Assistant and how to use it. + +## Features + +The operation of the MetaGPT Android Assistant mainly includes two stages: learning and automatic execution. Below, we introduce the specific features of the MetaGPT Android Assistant from these two stages. + +### Learning Stage + +By learning from human demonstrations or exploring apps based on human instructions, the MetaGPT Android Assistant can learn the functionality of apps, generate corresponding operation documents for use in the subsequent "automatic execution" stage. Approximately 20 rounds of exploration for any given task objective can significantly improve performance. + +By setting the `stage` to `learn`, you can ask the Android Assistant to enter the learning stage. By setting the `mode` to `auto`, you can instruct the Android Assistant to learn through automatic exploration; by setting the mode to manual, you can instruct the Android Assistant to learn through human manual demonstration. In the usage section, we provide detailed explanations of the script parameters. You can try experimenting with automatic exploration and manual demonstration modes on the "Messenger" app with the following commands: + +```bash +cd examples/android_assistant +python run_assistant.py "Send 'When will we release this feature?' to +86 8888888" --stage "learn" --mode "auto or manual" --app-name "Messenger" +``` + +#### Learning Based on Human Demonstration +When asking the Android Assistant to perform self-exploration during the learning stage, you can free your hands. However, when instructing it to learn according to your commands, you need to follow the instructions in the terminal for the Android Assistant to accurately learn your operation methods. +A possible example is as follows: + +```bash +cd examples/android_assistant +python run_assistant.py "Send 'When will we release this feature?' to +86 8888888" --stage "learn" --mode "manual" --app-name "Messenger" +``` + +After running this command, you will first see a screenshot of an Android screen that has been marked at various interactive locations, as shown in the figure below: + + + +After remembering the location where you want to operate, a request similar to the one below will be output in the terminal. Reply to it and thereby direct the Android assistant to learn your demonstration action: + +```bash +| INFO | examples.android_assistant.actions.manual_record:run:96 - Which element do you want to tap? Choose a numeric tag from 1 to 11: +user_input: 8 +| INFO | examples.android_assistant.actions.manual_record:run:81 - Choose one of the following actions you want to perform on the current screen: +tap, text, long_press, swipe, stop +user_input: tap +``` + +### Automatic Execution Stage +After the Android Assistant completes the learning stage, you can command it to complete tasks on the phone through text descriptions. By configuring the operation documents from the self-learning stage, the Android Assistant has richer prior knowledge, and its execution capabilities are further enhanced. +You can instruct the Android Assistant to send messages in the "Messenger" app with the following command: +```bash +python run_assistant.py "Send 'When will we release this feature?' to +86 8888888" --stage "act" --mode "auto or manual" --app-name "Messenger" +``` +Specifically, by selecting `auto` for `mode`, the Android assistant will employ the operational records compiled through self-exploration. Alternatively, if `manual` is chosen as the `mode`, the Android assistant will leverage the operation manuals accrued from learning via human demonstration. + +## Installation +To use the Android Assistant, you first need to meet the following conditions: +1. Complete the installation of the MetaGPT environment. +2. Install [Android Debug Bridge (ADB)](https://developer.android.com/tools/adb?hl=zh-cn) on your PC, which enables interaction between your PC and Android devices. +3. Install Android Studio and within it, install the Android emulator to provide an environment for the Android Assistant to learn and execute. For information on how to install the Android emulator, refer to [Quick Installation of Android Studio & Emulator](https://docs.expo.dev/workflow/android-studio-emulator/). +4. (Optional) Connect your Android device to the USB port of your PC, which can also provide an environment for the Android Assistant to learn and execute. + +Note ⚠️: When operating with the Android emulator, the emulator model we use is Medium Phone, which is recommended for first-time users to complete the operation. + +After completing these operations, you can enter the following command to check if ADB is installed successfully and if the Android device is connected: +```bash +adb devices +``` + +## Usage +The MetaGPT Android Assistant is designed within the MetaGPT framework as a collection of Roles and multiple Actions. You can run it by executing the `run_assistant.py` script. The specific parameter description of this script is as follows: +```text +Usage: run_assistant.py [OPTIONS] TASK_DESC + + Run a Android Assistant + +Arguments: + TASK_DESC the task description you want the android assistant to learn or + act [required] + +Options: + --n-round INTEGER The max round to do an app operation task. + [default: 20] + --stage TEXT stage: learn / act [default: learn] + --mode TEXT mode: auto / manual , when state=learn + [default: auto] + --app-name TEXT the name of app you want to run [default: + demo] + --investment FLOAT Dollar amount to invest in the AI company. + [default: 5.0] + --refine-doc / --no-refine-doc Refine existing operation docs based on the + latest observation if True. [default: no- + refine-doc] + --min-dist INTEGER The minimum distance between elements to + prevent overlapping during the labeling + process. [default: 30] + --android-screenshot-dir TEXT The path to store screenshots on android + device. Make sure it exists. [default: + /sdcard/Pictures/Screenshots] + --android-xml-dir TEXT The path to store xml files for determining + UI elements localtion. Make sure it exists. + [default: /sdcard] + --device-id TEXT The Android device_id [default: + emulator-5554] + --help Show this message and exit. +``` + +## Acknowledgements +The MetaGPT Android Assistant has referenced some ideas and code from the [AppAgent](https://github.com/mnotgod96/AppAgent) project. We thank the developers of the Appagent project. + +### Citation + +```bib +@misc{yang2023appagent, + title={AppAgent: Multimodal Agents as Smartphone Users}, + author={Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu}, + year={2023}, + eprint={2312.13771}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` \ No newline at end of file diff --git a/metagpt/ext/android_assistant/README_CN.md b/metagpt/ext/android_assistant/README_CN.md new file mode 100644 index 0000000000..a1abbe3b0b --- /dev/null +++ b/metagpt/ext/android_assistant/README_CN.md @@ -0,0 +1,113 @@ +# MetaGPT 安卓助理 + +MetaGPT安卓助理是一款依托于先进的MetaGPT框架构建的多模态大语言模型驱动的智能辅助工具。 +它具备自我学习的能力,能够通过学习掌握用户的日常使用方式,同时能够根据用户的指令自动完成各类应用程序的操作任务,实现了用户双手的全面解放。 +接下来,我们将介绍MetaGPT安卓助理的功能以及如何使用它。 + +## 功能 + +MetaGPT 安卓助理的执行主要包含两个阶段,分别为自我学习与自动执行。下面,我们将从这两个阶段介绍MetaGPT 安卓助理的具体功能。 + +### 自我学习阶段 + +通过学习人类演示或基于人类指令对app进行探索,MetaGPT安卓助理可以对app的功能进行学习,生成相应的操作文档,为后续的“自动执行”阶段使用。对于任何给定的任务目标,进行约20轮的探索可以显著提高性能。 + +通过设定`stage`为`learn`可要求安卓助理进入自我学习阶段。通过设定`mode`为`auto`,可要求安卓助理通过自动探索学习,通过设定`mode`为`manual`,可要求安卓助理通过人类手动演示学习。在使用章节,我们对脚本的参数进行了详细的说明。 +您可以尝试对“Messenger”应用程序进行自动探索和手动演示模式的实验,具体命令如下: + +```bash +cd examples/android_assistant +python run_assistant.py "Send 'When will we release this feature? to +86 8888888'" --stage "learn" --mode "auto or manual" --app-name "Messenger" +``` + +#### 基于人类演示的学习 +在要求安卓助理在自我学习阶段执行自我探索时,您可以解放您的双手,但在要求他根据您的指令进行学习时,你需要根据终端中的指令进行输入,以便安卓助理能够准确地学习您的操作方式。 +一个可能的例子如下: + +```bash +cd examples/android_assistant +python run_assistant.py "Send 'When will we release this feature? to +86 8888888'" --stage "learn" --mode "manual" --app-name "Messenger" +``` + +在运行这一指令后,你将首先看到一个在各个可交互的位置进行了标记的安卓屏幕的截图,如下图: + + + +在记住你要操作的位置之后,终端中将会输出与下面类似的要求,回复它,进而指挥安卓助理学习你的演示行为: + +```bash +| INFO | examples.android_assistant.actions.manual_record:run:96 - Which element do you want to tap? Choose a numeric tag from 1 to 11: +user_input: 8 +| INFO | examples.android_assistant.actions.manual_record:run:81 - Choose one of the following actions you want to perform on the current screen: +tap, text, long_press, swipe, stop +user_input: tap +``` +### 自动执行阶段 +在安卓助理完成了自我学习阶段之后,您可以通过文本描述的方式,指挥安卓助理在手机中完成任务。通过为其配置自我学习阶段的操作文档,安卓助理具备了更丰富的前置知识,执行能力进一步得到提升。 +你可以通过以下指令,指挥安卓助理在“Messenger”应用中发送信息: +```bash +python run_assistant.py "Send 'When will we release this feature? to +86 8888888'" --stage "act" --mode "auto or manual" --app-name "Messenger" +``` +其中,`mode`选择`auto`,安卓助理将使用自我探索中积累的操作文档;`mode`选择`manual`,安卓助理将使用人类演示学习中积累的操作文档。 + +## 安装 +为了使用安卓助理,你首先需要满足以下条件: +1. 完成MetaGPT环境的安装 +2. 在你的PC上安装[Android Debug Bridge(ADB)](https://developer.android.com/tools/adb?hl=zh-cn),ADB可以使你的PC与安卓设备进行交互。 +3. 安装Android Studio,在其中安装Android模拟器,以为安卓助手提供学习与执行的环境。关于如何安装Android模拟器,可以参考[快速安装Android Studio & Emulator](https://dev.weixin.qq.com/docs/framework/dev/framework/env/android-simulator.html)。 +4. (Optional) 将你的安卓设备连接到PC的USB端口上,这同样可以为安卓助手提供学习与执行的环境。 + +注意 ⚠️:在使用Android模拟器进行操作时,我们使用的模拟器型号为Medium Phone,建议第一次尝试此类应用的用户使用这一型号完成操作。 + +在完成这一系列操作之后,你可以输入以下命令检查ADB是否安装成功,以及安卓设备是否连接 +```bash +adb devices +``` +## 使用 +MetaGPT 安卓助理在MetaGPT框架中被设计为一个`Role`与多个`Action`的集合,你可以通过运行`run_assistant.py`脚本来运行它。这一脚本具体的参数说明如下: +```text +用法:run_assistant.py [选项] 任务描述 + + 运行一个安卓助手 + +参数: + TASK_DESC 你希望安卓助手学习或执行的任务描述 + [必需] + +选项: + --n-round 整数 执行应用程序操作任务的最大轮数。 + [默认值:20] + --stage 文本 阶段:learn/act [默认值:learn] + --mode 文本 模式:auto/manual,当状态=learn时 [默认值:auto] + --app-name 文本 你想要运行的应用程序名称 [默认值: + 演示] + --investment 浮点数 投资于人工智能公司的美元金额。 + [默认值:5.0] + --refine-doc / --no-refine-doc 如果为真,则根据最新的观察结果优化现有操作文档。 + [默认值:--no-refine-doc] + --min-dist 整数 在标记过程中防止元素重叠的最小元素间距。 + [默认值:30] + --android-screenshot-dir 文本 在安卓设备上存储截图的路径。确保其存在。 + [默认值:/sdcard/Pictures/Screenshots] + --android-xml-dir 文本 存储用于确定UI元素位置的XML文件的路径。 + 确保其存在。[默认值:/sdcard] + --device-id 文本 安卓device_id [默认值: + 模拟器-5554] + --help 显示此信息并退出。 +``` + +## 致谢 +MetaGPT 安卓助理参考了 [AppAgent](https://github.com/mnotgod96/AppAgent) 项目的部分思路与代码,感谢 Appagent 项目的开发者们。 + +### 引用 + +```bib +@misc{yang2023appagent, + title={AppAgent: Multimodal Agents as Smartphone Users}, + author={Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu}, + year={2023}, + eprint={2312.13771}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` \ No newline at end of file diff --git a/metagpt/ext/android_assistant/__init__.py b/metagpt/ext/android_assistant/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/android_assistant/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/actions/__init__.py b/metagpt/ext/android_assistant/actions/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/actions/manual_record.py b/metagpt/ext/android_assistant/actions/manual_record.py new file mode 100644 index 0000000000..bcfb2ed893 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/manual_record.py @@ -0,0 +1,168 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : manual record user interaction in stage=learn & mode=manual, LIKE scripts/step_recorder.py +import time +from pathlib import Path + +import cv2 + +from metagpt.actions.action import Action +from metagpt.config2 import config +from metagpt.environment.android.android_env import AndroidEnv +from metagpt.environment.android.const import ADB_EXEC_FAIL +from metagpt.environment.android.env_space import ( + EnvAction, + EnvActionType, + EnvObsParams, + EnvObsType, +) +from metagpt.ext.android_assistant.utils.schema import ( + ActionOp, + AndroidActionOutput, + RunState, + SwipeOp, +) +from metagpt.ext.android_assistant.utils.utils import ( + draw_bbox_multi, + elem_list_from_xml_tree, +) +from metagpt.logs import logger + + +class ManualRecord(Action): + """do a human operation on the screen with human input""" + + name: str = "ManualRecord" + + useless_list: list[str] = [] # store useless elements uid + record_path: Path = "" + task_desc_path: Path = "" + screenshot_before_path: Path = "" + screenshot_after_path: Path = "" + xml_path: Path = "" + + async def run(self, task_desc: str, task_dir: Path, env: AndroidEnv): + self.record_path = Path(task_dir) / "record.txt" + self.task_desc_path = Path(task_dir) / "task_desc.txt" + self.screenshot_before_path = Path(task_dir) / "raw_screenshots" + self.screenshot_after_path = Path(task_dir) / "labeled_screenshots" + self.xml_path = Path(task_dir) / "xml" + for path in [self.screenshot_before_path, self.screenshot_after_path, self.xml_path]: + path.mkdir(parents=True, exist_ok=True) + + self.record_path.write_text("") + record_file = open(self.record_path, "w") + self.task_desc_path.write_text(task_desc) + + step = 0 + extra_config = config.extra + while True: + step += 1 + screenshot_path: Path = env.observe( + EnvObsParams( + obs_type=EnvObsType.GET_SCREENSHOT, ss_name=f"{step}", local_save_dir=self.screenshot_before_path + ) + ) + xml_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_XML, xml_name=f"{step}", local_save_dir=self.xml_path) + ) + if not screenshot_path.exists() or not xml_path.exists(): + return AndroidActionOutput(action_state=RunState.FAIL) + + elem_list = elem_list_from_xml_tree(xml_path, self.useless_list, extra_config.get("min_dist", 30)) + + screenshot_labeled_path = Path(self.screenshot_after_path).joinpath(f"{step}_labeled.png") + labeled_img = draw_bbox_multi(screenshot_path, screenshot_labeled_path, elem_list) + + cv2.namedWindow("image", cv2.WINDOW_NORMAL) + cv2.imshow("image", labeled_img) + cv2.waitKey(0) + cv2.destroyAllWindows() + + user_input = "xxx" + logger.info( + "Choose one of the following actions you want to perform on the current screen:\n" + "tap, text, long_press, swipe, stop" + ) + + while ( + user_input.lower() != ActionOp.TAP.value + and user_input.lower() != ActionOp.TEXT.value + and user_input.lower() != ActionOp.LONG_PRESS.value + and user_input.lower() != ActionOp.SWIPE.value + and user_input.lower() != ActionOp.STOP.value + ): + user_input = input("user_input: ") + + if user_input.lower() == ActionOp.TAP.value: + logger.info(f"Which element do you want to tap? Choose a numeric tag from 1 to {len(elem_list)}:") + user_input = "xxx" + while not user_input.isnumeric() or int(user_input) > len(elem_list) or int(user_input) < 1: + user_input = input("user_input: ") + tl, br = elem_list[int(user_input) - 1].bbox + x, y = (tl[0] + br[0]) // 2, (tl[1] + br[1]) // 2 + action = EnvAction(action_type=EnvActionType.SYSTEM_TAP, coord=(x, y)) + log_str = f"tap({int(user_input)}):::{elem_list[int(user_input) - 1].uid}\n" + elif user_input.lower() == ActionOp.TEXT.value: + logger.info( + f"Which element do you want to input the text string? Choose a numeric tag from 1 to " + f"{len(elem_list)}:" + ) + input_area = "xxx" + while not input_area.isnumeric() or int(input_area) > len(elem_list) or int(input_area) < 1: + input_area = input("user_input: ") + logger.info("Enter your input text below:") + user_input = "" + while not user_input: + user_input = input("user_input: ") + action = EnvAction(action_type=EnvActionType.USER_INPUT, input_txt=user_input) + log_str = f"text({input_area}:sep:'{user_input}'):::{elem_list[int(input_area) - 1].uid}\n" + elif user_input.lower() == ActionOp.LONG_PRESS.value: + logger.info( + f"Which element do you want to long press? Choose a numeric tag from 1 to {len(elem_list)}:" + ) + user_input = "xxx" + while not user_input.isnumeric() or int(user_input) > len(elem_list) or int(user_input) < 1: + user_input = input("user_input: ") + tl, br = elem_list[int(user_input) - 1].bbox + x, y = (tl[0] + br[0]) // 2, (tl[1] + br[1]) // 2 + action = EnvAction(action_type=EnvActionType.USER_LONGPRESS, coord=(x, y)) + log_str = f"long_press({int(user_input)}):::{elem_list[int(user_input) - 1].uid}\n" + elif user_input.lower() == ActionOp.SWIPE.value: + logger.info( + "What is the direction of your swipe? Choose one from the following options:\n" + "up, down, left, right" + ) + user_input = "" + while ( + user_input != SwipeOp.UP.value + and user_input != SwipeOp.DOWN.value + and user_input != SwipeOp.LEFT.value + and user_input != SwipeOp.RIGHT.value + ): + user_input = input("user_input: ") + swipe_dir = user_input + logger.info(f"Which element do you want to swipe? Choose a numeric tag from 1 to {len(elem_list)}:") + while not user_input.isnumeric() or int(user_input) > len(elem_list) or int(user_input) < 1: + user_input = input("user_input: ") + tl, br = elem_list[int(user_input) - 1].bbox + x, y = (tl[0] + br[0]) // 2, (tl[1] + br[1]) // 2 + + action = EnvAction(action_type=EnvActionType.USER_SWIPE, coord=(x, y), orient=swipe_dir) + log_str = f"swipe({int(user_input)}:sep:{swipe_dir}):::{elem_list[int(user_input) - 1].uid}\n" + elif user_input.lower() == ActionOp.STOP.value: + record_file.write("stop\n") + record_file.close() + break + else: + break + + obs, _, _, _, info = env.step(action) + action_res = info["res"] + if action_res == ADB_EXEC_FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + record_file.write(log_str) + + time.sleep(1) + + return AndroidActionOutput(action_state=RunState.SUCCESS) diff --git a/metagpt/ext/android_assistant/actions/parse_record.py b/metagpt/ext/android_assistant/actions/parse_record.py new file mode 100644 index 0000000000..304daf6556 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/parse_record.py @@ -0,0 +1,137 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : parse record to generate learned standard operations in stage=learn & mode=manual, +# LIKE scripts/document_generation.py + +import ast +import re +from pathlib import Path + +from metagpt.actions.action import Action +from metagpt.config2 import config +from metagpt.ext.android_assistant.actions.parse_record_an import RECORD_PARSE_NODE +from metagpt.ext.android_assistant.prompts.operation_prompt import ( + long_press_doc_template, + refine_doc_suffix, + swipe_doc_template, + tap_doc_template, + text_doc_template, +) +from metagpt.ext.android_assistant.utils.schema import ( + ActionOp, + AndroidActionOutput, + RecordLogItem, + RunState, + SwipeOp, +) +from metagpt.logs import logger +from metagpt.utils.common import encode_image + + +class ParseRecord(Action): + name: str = "ParseRecord" + record_path: Path = "" + task_desc_path: Path = "" + screenshot_before_path: Path = "" + screenshot_after_path: Path = "" + + async def run(self, task_dir: Path, docs_dir: Path): + doc_count = 0 + self.record_path = Path(task_dir) / "record.txt" + self.task_desc_path = Path(task_dir) / "task_desc.txt" + self.screenshot_before_path = Path(task_dir) / "raw_screenshots" + self.screenshot_after_path = Path(task_dir) / "labeled_screenshots" + for path in [self.screenshot_before_path, self.screenshot_after_path]: + path.mkdir(parents=True, exist_ok=True) + + task_desc = self.task_desc_path.read_text() + extra_config = config.extra + + with open(self.record_path, "r") as record_file: + record_step_count = len(record_file.readlines()) - 1 + record_file.seek(0) + for step in range(1, record_step_count + 1): + img_before_base64 = encode_image(self.screenshot_after_path.joinpath(f"{step}_labeled.png")) + img_after_base64 = encode_image(self.screenshot_after_path.joinpath(f"{step + 1}_labeled.png")) + rec = record_file.readline().strip() + action, resource_id = rec.split(":::") + action_type = action.split("(")[0] + # 构建Prompt + action_param = re.findall(r"\((.*?)\)", action)[0] + if action_type == ActionOp.TAP.value: + prompt_template = tap_doc_template + context = prompt_template.format(ui_element=action_param) + elif action_type == ActionOp.TEXT.value: + input_area, input_text = action_param.split(":sep:") + prompt_template = text_doc_template + context = prompt_template.format(ui_element=input_area) + elif action_type == ActionOp.LONG_PRESS.value: + prompt_template = long_press_doc_template + context = prompt_template.format(ui_element=action_param) + elif action_type == ActionOp.SWIPE.value: + swipe_area, swipe_dir = action_param.split(":sep:") + if swipe_dir == SwipeOp.UP.value or swipe_dir == SwipeOp.DOWN.value: + action_type = ActionOp.VERTICAL_SWIPE.value + elif swipe_dir == SwipeOp.LEFT.value or swipe_dir == SwipeOp.RIGHT.value: + action_type = ActionOp.HORIZONTAL_SWIPE.value + prompt_template = swipe_doc_template + context = prompt_template.format(swipe_dir=swipe_dir, ui_element=swipe_area) + else: + break + context = context.format(task_desc=task_desc) + + doc_name = resource_id + ".txt" + doc_path = docs_dir.joinpath(doc_name) + + if doc_path.exists(): + try: + doc_content = ast.literal_eval(doc_path.read_text()) + except Exception as exp: + logger.error(f"ast parse doc: {doc_path} failed, exp: {exp}") + continue + + if doc_content[action_type]: + if extra_config.get("doc_refine", False): + refine_context = refine_doc_suffix.format(old_doc=doc_content[action_type]) + context += refine_context + logger.info( + f"Documentation for the element {resource_id} already exists. The doc will be " + f"refined based on the latest demo." + ) + else: + logger.info( + f"Documentation for the element {resource_id} already exists. Turn on DOC_REFINE " + f"in the config file if needed." + ) + continue + else: + doc_content = {"tap": "", "text": "", "v_swipe": "", "h_swipe": "", "long_press": ""} + + logger.info(f"Waiting for GPT-4V to generate documentation for the element {resource_id}") + node = await RECORD_PARSE_NODE.fill( + context=context, llm=self.llm, images=[img_before_base64, img_after_base64] + ) + if "error" in node.content: + return AndroidActionOutput(action_state=RunState.FAIL) + log_path = task_dir.joinpath("log_parse_record.txt") + prompt = node.compile(context=context, schema="json", mode="auto") + msg = node.content + doc_content[action_type] = msg + + with open(log_path, "a") as logfile: + log_item = RecordLogItem( + step=step, + prompt=prompt, + image_before=img_before_base64, + image_after=img_after_base64, + response=node.content, + ) + logfile.write(log_item.model_dump_json() + "\n") + with open(doc_path, "w") as outfile: + outfile.write(str(doc_content)) + doc_count += 1 + logger.info(f"Documentation generated and saved to {doc_path}") + + logger.info(f"Documentation generation phase completed. {doc_count} docs generated.") + + return AndroidActionOutput(action_state=RunState.FINISH) diff --git a/metagpt/ext/android_assistant/actions/parse_record_an.py b/metagpt/ext/android_assistant/actions/parse_record_an.py new file mode 100644 index 0000000000..210c93e236 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/parse_record_an.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the ActionNode to parse record + +from metagpt.actions.action_node import ActionNode + +OBSERVATION = ActionNode( + key="Observation", + expected_type=str, + instruction="Provide a description of your observations of the two images. " + "Subsequently, delineate the distinctions between the first image and the second one.", + example="", +) + +THOUGHT = ActionNode( + key="Thought", + expected_type=str, + instruction="Consider the impact of Action acting on UI elements.", + example="", +) + +DESCRIPTION = ActionNode( + key="Description", + expected_type=str, + instruction="Describe the functionality of the UI element concisely in one or two sentences Do not include " + "the numeric tag in your description", + example="", +) + +NODES = [OBSERVATION, THOUGHT, DESCRIPTION] + +RECORD_PARSE_NODE = ActionNode.from_children("RecordParse", NODES) diff --git a/metagpt/ext/android_assistant/actions/screenshot_parse.py b/metagpt/ext/android_assistant/actions/screenshot_parse.py new file mode 100644 index 0000000000..4d8bb0e1eb --- /dev/null +++ b/metagpt/ext/android_assistant/actions/screenshot_parse.py @@ -0,0 +1,204 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : LIKE scripts/task_executor.py in stage=act + +import ast +from pathlib import Path + +from metagpt.actions.action import Action +from metagpt.config2 import config +from metagpt.environment.android.android_env import AndroidEnv +from metagpt.environment.android.const import ADB_EXEC_FAIL +from metagpt.environment.android.env_space import ( + EnvAction, + EnvActionType, + EnvObsParams, + EnvObsType, +) +from metagpt.ext.android_assistant.actions.screenshot_parse_an import ( + SCREENSHOT_PARSE_NODE, +) +from metagpt.ext.android_assistant.prompts.assistant_prompt import ( + screenshot_parse_template, + screenshot_parse_with_grid_template, +) +from metagpt.ext.android_assistant.utils.schema import ( + AndroidActionOutput, + AndroidElement, + GridOpParam, + LongPressGridOpParam, + LongPressOpParam, + OpLogItem, + RunState, + SwipeGridOpParam, + SwipeOpParam, + TapGridOpParam, + TapOpParam, + TextOpParam, +) +from metagpt.ext.android_assistant.utils.utils import ( + area_to_xy, + draw_bbox_multi, + draw_grid, + elem_bbox_to_xy, + screenshot_parse_extract, + traverse_xml_tree, +) +from metagpt.logs import logger +from metagpt.utils.common import encode_image + + +class ScreenshotParse(Action): + name: str = "ScreenshotParse" + + def _makeup_ui_document(self, elem_list: list[AndroidElement], docs_idr: Path, use_exist_doc: bool = True) -> str: + if not use_exist_doc: + return "" + + ui_doc = """ +You also have access to the following documentations that describes the functionalities of UI +elements you can interact on the screen. These docs are crucial for you to determine the target of your +next action. You should always prioritize these documented elements for interaction: """ + for i, elem in enumerate(elem_list): + doc_path = docs_idr.joinpath(f"{elem.uid}.txt") + if not doc_path.exists(): + continue + try: + doc_content = ast.literal_eval(doc_path.read_text()) + except Exception as exp: + logger.error(f"ast parse doc: {doc_path} failed, exp: {exp}") + continue + + ui_doc += f"Documentation of UI element labeled with the numeric tag '{i + 1}':\n" + if doc_content["tap"]: + ui_doc += f"This UI element is clickable. {doc_content['tap']}\n\n" + if doc_content["text"]: + ui_doc += ( + f"This UI element can receive text input. The text input is used for the following " + f"purposes: {doc_content['text']}\n\n" + ) + if doc_content["long_press"]: + ui_doc += f"This UI element is long clickable. {doc_content['long_press']}\n\n" + if doc_content["v_swipe"]: + ui_doc += ( + f"This element can be swiped directly without tapping. You can swipe vertically on " + f"this UI element. {doc_content['v_swipe']}\n\n" + ) + if doc_content["h_swipe"]: + ui_doc += ( + f"This element can be swiped directly without tapping. You can swipe horizontally on " + f"this UI element. {doc_content['h_swipe']}\n\n" + ) + return ui_doc + + async def run( + self, + round_count: int, + task_desc: str, + last_act: str, + task_dir: Path, + docs_dir: Path, + grid_on: bool, + env: AndroidEnv, + ): + extra_config = config.extra + for path in [task_dir, docs_dir]: + path.mkdir(parents=True, exist_ok=True) + screenshot_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_SCREENSHOT, ss_name=f"{round_count}_before", local_save_dir=task_dir) + ) + xml_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_XML, xml_name=f"{round_count}", local_save_dir=task_dir) + ) + if not screenshot_path.exists() or not xml_path.exists(): + return AndroidActionOutput(action_state=RunState.FAIL) + + clickable_list = [] + focusable_list = [] + traverse_xml_tree(xml_path, clickable_list, "clickable", True) + traverse_xml_tree(xml_path, focusable_list, "focusable", True) + elem_list: list[AndroidElement] = clickable_list.copy() + for elem in focusable_list: + bbox = elem.bbox + center = (bbox[0][0] + bbox[1][0]) // 2, (bbox[0][1] + bbox[1][1]) // 2 + close = False + for e in clickable_list: + bbox = e.bbox + center_ = (bbox[0][0] + bbox[1][0]) // 2, (bbox[0][1] + bbox[1][1]) // 2 + dist = (abs(center[0] - center_[0]) ** 2 + abs(center[1] - center_[1]) ** 2) ** 0.5 + if dist <= extra_config.get("min_dist", 30): + close = True + break + if not close: + elem_list.append(elem) + + screenshot_labeled_path = task_dir.joinpath(f"{round_count}_labeled.png") + draw_bbox_multi(screenshot_path, screenshot_labeled_path, elem_list) + img_base64 = encode_image(screenshot_labeled_path) + + parse_template = screenshot_parse_with_grid_template if grid_on else screenshot_parse_template + + if grid_on: + env.rows, env.cols = draw_grid(screenshot_path, task_dir / f"{round_count}_grid.png") + + ui_doc = self._makeup_ui_document(elem_list, docs_dir) + context = parse_template.format(ui_document=ui_doc, task_description=task_desc, last_act=last_act) + node = await SCREENSHOT_PARSE_NODE.fill(context=context, llm=self.llm, images=[img_base64]) + + if "error" in node.content: + return AndroidActionOutput(action_state=RunState.FAIL) + + prompt = node.compile(context=context, schema="json", mode="auto") + OpLogItem(step=round_count, prompt=prompt, image=str(screenshot_labeled_path), response=node.content) + + op_param = screenshot_parse_extract(node.instruct_content.model_dump(), grid_on) + if op_param.param_state == RunState.FINISH: + logger.info(f"op_param: {op_param}") + return AndroidActionOutput(action_state=RunState.FINISH) + if op_param.param_state == RunState.FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + + last_act = op_param.last_act + if isinstance(op_param, TapOpParam): + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction(action_type=EnvActionType.SYSTEM_TAP, coord=(x, y)) + elif isinstance(op_param, TextOpParam): + action = EnvAction(action_type=EnvActionType.USER_INPUT, input_txt=op_param.input_str) + elif isinstance(op_param, LongPressOpParam): + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction(action_type=EnvActionType.USER_LONGPRESS, coord=(x, y)) + elif isinstance(op_param, SwipeOpParam): + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction( + action_type=EnvActionType.USER_SWIPE, coord=(x, y), orient=op_param.swipe_orient, dist=op_param.dist + ) + elif isinstance(op_param, GridOpParam): + grid_on = True + elif isinstance(op_param, TapGridOpParam) or isinstance(op_param, LongPressGridOpParam): + x, y = area_to_xy(op_param.area, op_param.subarea, env.width, env.height, env.rows, env.cols) + if isinstance(op_param, TapGridOpParam): + action = EnvAction(action_type=EnvActionType.SYSTEM_TAP, coord=(x, y)) + else: + # LongPressGridOpParam + action = EnvAction(action_type=EnvActionType.USER_LONGPRESS, coord=(x, y)) + elif isinstance(op_param, SwipeGridOpParam): + start_x, start_y = area_to_xy( + op_param.start_area, op_param.start_subarea, env.width, env.height, env.rows, env.cols + ) + end_x, end_y = area_to_xy( + op_param.end_area, op_param.end_subarea, env.width, env.height, env.rows, env.cols + ) + action = EnvAction( + action_type=EnvActionType.USER_SWIPE_TO, coord=(start_x, start_y), tgt_coord=(end_x, end_y) + ) + + if not grid_on: + obs, _, _, _, info = env.step(action) + action_res = info["res"] + if action_res == ADB_EXEC_FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + + if op_param.act_name != "grid": + grid_on = False + + return AndroidActionOutput(data={"grid_on": grid_on, "last_act": last_act}) diff --git a/metagpt/ext/android_assistant/actions/screenshot_parse_an.py b/metagpt/ext/android_assistant/actions/screenshot_parse_an.py new file mode 100644 index 0000000000..eb23ba9344 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/screenshot_parse_an.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the ActionNode to parse screenshot + +from metagpt.actions.action_node import ActionNode + +OBSERVATION = ActionNode( + key="Observation", expected_type=str, instruction="Describe what you observe in the image", example="" +) + +THOUGHT = ActionNode( + key="Thought", + expected_type=str, + instruction="To complete the given task, what is the next step I should do", + example="", +) + +ACTION = ActionNode( + key="Action", + expected_type=str, + instruction="The function call with the correct parameters to proceed with the task. If you believe the task is " + "completed or there is nothing to be done, you should output FINISH. You cannot output anything else " + "except a function call or FINISH in this field.", + example="", +) + +SUMMARY = ActionNode( + key="Summary", + expected_type=str, + instruction="Summarize your past actions along with your latest action in one or two sentences. Do not include " + "the numeric tag in your summary", + example="", +) + +SUMMARY_GRID = ActionNode( + key="Summary", + expected_type=str, + instruction="Summarize your past actions along with your latest action in one or two sentences. Do not include " + "the grid area number in your summary", + example="", +) + +NODES = [OBSERVATION, THOUGHT, ACTION, SUMMARY] + +NODES_GRID = [OBSERVATION, THOUGHT, ACTION, SUMMARY_GRID] + +SCREENSHOT_PARSE_NODE = ActionNode.from_children("ScreenshotParse", NODES) +SCREENSHOT_PARSE_GRID_NODE = ActionNode.from_children("ScreenshotParseGrid", NODES_GRID) diff --git a/metagpt/ext/android_assistant/actions/self_learn_and_reflect.py b/metagpt/ext/android_assistant/actions/self_learn_and_reflect.py new file mode 100644 index 0000000000..5e9cfbb454 --- /dev/null +++ b/metagpt/ext/android_assistant/actions/self_learn_and_reflect.py @@ -0,0 +1,231 @@ +# !/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : LIKE scripts/self_explorer.py in stage=learn & mode=auto self_explore_task stage + +import ast +from pathlib import Path + +from metagpt.actions.action import Action +from metagpt.config2 import config +from metagpt.environment.android.android_env import AndroidEnv +from metagpt.environment.android.const import ADB_EXEC_FAIL +from metagpt.environment.android.env_space import ( + EnvAction, + EnvActionType, + EnvObsParams, + EnvObsType, +) +from metagpt.ext.android_assistant.actions.screenshot_parse_an import ( + SCREENSHOT_PARSE_NODE, +) +from metagpt.ext.android_assistant.actions.self_learn_reflect_an import ( + SELF_LEARN_REFLECT_NODE, +) +from metagpt.ext.android_assistant.prompts.assistant_prompt import ( + screenshot_parse_self_explore_reflect_template as reflect_template, +) +from metagpt.ext.android_assistant.prompts.assistant_prompt import ( + screenshot_parse_self_explore_template, +) +from metagpt.ext.android_assistant.utils.schema import ( + ActionOp, + AndroidActionOutput, + AndroidElement, + Decision, + DocContent, + LongPressOpParam, + OpLogItem, + ReflectLogItem, + RunState, + SwipeOp, + SwipeOpParam, + TapOpParam, + TextOpParam, +) +from metagpt.ext.android_assistant.utils.utils import ( + draw_bbox_multi, + elem_bbox_to_xy, + elem_list_from_xml_tree, + reflect_parse_extarct, + screenshot_parse_extract, +) +from metagpt.logs import logger +from metagpt.utils.common import encode_image + + +class SelfLearnAndReflect(Action): + name: str = "SelfLearnAndReflect" + + useless_list: list[str] = [] # store useless elements uid + + screenshot_before_path: str = "" + screenshot_before_base64: str = "" + elem_list: list[AndroidElement] = [] + swipe_orient: str = "up" + act_name: str = "" + ui_area: int = -1 + + async def run( + self, round_count: int, task_desc: str, last_act: str, task_dir: Path, docs_dir: Path, env: AndroidEnv + ) -> AndroidActionOutput: + for path in [task_dir, docs_dir]: + path.mkdir(parents=True, exist_ok=True) + resp = await self.run_self_learn(round_count, task_desc, last_act, task_dir, env) + if resp.action_state != RunState.SUCCESS: + return resp + + resp = await self.run_reflect(round_count, task_desc, last_act, task_dir, docs_dir, env) + return resp + + async def run_self_learn( + self, round_count: int, task_desc: str, last_act: str, task_dir: Path, env: AndroidEnv + ) -> AndroidActionOutput: + extra_config = config.extra + screenshot_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_SCREENSHOT, ss_name=f"{round_count}_before", local_save_dir=task_dir) + ) + xml_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_XML, xml_name=f"{round_count}", local_save_dir=task_dir) + ) + if not screenshot_path.exists() or not xml_path.exists(): + return AndroidActionOutput(action_state=RunState.FAIL) + + elem_list = elem_list_from_xml_tree(xml_path, self.useless_list, extra_config.get("min_dist", 30)) + + screenshot_before_labeled_path = task_dir.joinpath(f"{round_count}_before_labeled.png") + draw_bbox_multi(screenshot_path, screenshot_before_labeled_path, elem_list) + img_base64 = encode_image(screenshot_before_labeled_path) + self.screenshot_before_base64 = img_base64 + self.screenshot_before_path = screenshot_before_labeled_path + + self_explore_template = screenshot_parse_self_explore_template + context = self_explore_template.format(task_description=task_desc, last_act=last_act) + + node = await SCREENSHOT_PARSE_NODE.fill(context=context, llm=self.llm, images=[img_base64]) + logger.debug(f"fill result:{node}") + if "error" in node.content: + return AndroidActionOutput(action_state=RunState.FAIL) + prompt = node.compile(context=context, schema="json", mode="auto") + # Modify WindowsPath to Str + OpLogItem(step=round_count, prompt=prompt, image=str(screenshot_before_labeled_path), response=node.content) + op_param = screenshot_parse_extract(node.instruct_content.model_dump(), grid_on=False) + # TODO Modify Op_param. When op_param.action is FINISH, how to solve this ? + if op_param.param_state == RunState.FINISH: + return AndroidActionOutput(action_state=RunState.FINISH) + if op_param.param_state == RunState.FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + + if isinstance(op_param, TapOpParam): + self.ui_area = op_param.area + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction(action_type=EnvActionType.SYSTEM_TAP, coord=(x, y)) + elif isinstance(op_param, TextOpParam): + action = EnvAction(action_type=EnvActionType.USER_INPUT, input_txt=op_param.input_str) + elif isinstance(op_param, LongPressOpParam): + self.ui_area = op_param.area + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction(action_type=EnvActionType.USER_LONGPRESS, coord=(x, y)) + elif isinstance(op_param, SwipeOpParam): + self.ui_area = op_param.area + self.swipe_orient = op_param.swipe_orient + x, y = elem_bbox_to_xy(elem_list[op_param.area - 1].bbox) + action = EnvAction( + action_type=EnvActionType.USER_SWIPE, coord=(x, y), orient=op_param.swipe_orient, dist=op_param.dist + ) + + obs, _, _, _, info = env.step(action) + action_res = info["res"] + if action_res == ADB_EXEC_FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + + self.elem_list = elem_list + self.act_name = op_param.act_name + return AndroidActionOutput() + + async def run_reflect( + self, round_count: int, task_desc: str, last_act: str, task_dir: Path, docs_dir: Path, env: AndroidEnv + ) -> AndroidActionOutput: + screenshot_path: Path = env.observe( + EnvObsParams(obs_type=EnvObsType.GET_SCREENSHOT, ss_name=f"{round_count}_after", local_save_dir=task_dir) + ) + if not screenshot_path.exists(): + return AndroidActionOutput(action_state=RunState.FAIL) + + screenshot_after_labeled_path = task_dir.joinpath(f"{round_count}_after_labeled.png") + draw_bbox_multi(screenshot_path, screenshot_after_labeled_path, elem_list=self.elem_list) + img_base64 = encode_image(screenshot_after_labeled_path) + if self.act_name == ActionOp.TAP.value: + action = "tapping" + elif self.act_name == ActionOp.LONG_PRESS.value: + action = "long pressing" + elif self.act_name == ActionOp.SWIPE.value: + action = "swiping" + if self.swipe_orient == SwipeOp.UP.value or self.swipe_orient == SwipeOp.DOWN.value: + action = "v_swipe" + elif self.swipe_orient == SwipeOp.LEFT.value or self.swipe_orient == SwipeOp.RIGHT.value: + action = "h_swipe" + else: + # TODO Test for assignment, This error is eupiped with the next. + logger.warning(f"Current action name parse failed, it's `{self.act_name}`") + action = None + context = reflect_template.format( + action=action, ui_element=str(self.ui_area), task_desc=task_desc, last_act=last_act + ) + node = await SELF_LEARN_REFLECT_NODE.fill( + context=context, llm=self.llm, images=[self.screenshot_before_base64, img_base64] + ) + + if "error" in node.content: + return AndroidActionOutput(action_state=RunState.FAIL) + + prompt = node.compile(context=context, schema="json", mode="auto") + ReflectLogItem( + step=round_count, + prompt=prompt, + image_before=str(self.screenshot_before_path), + image_after=str(screenshot_after_labeled_path), + response=node.content, + ) + + op_param = reflect_parse_extarct(node.instruct_content.model_dump()) + if op_param.param_state == RunState.FINISH: + return AndroidActionOutput(action_state=RunState.FINISH) + if op_param.param_state == RunState.FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + + logger.info( + f"reflect_parse_extarct decision: {op_param.decision}, " + f"elem_list size: {len(self.elem_list)}, ui_area: {self.ui_area}" + ) + # TODO here will cause `IndexError: list index out of range`. + # Maybe you should clink back to the desktop in the simulator + resource_id = self.elem_list[int(self.ui_area) - 1].uid + if op_param.decision == Decision.INEFFECTIVE.value: + self.useless_list.append(resource_id) + last_act = "NONE" # TODO global + elif op_param.decision in [Decision.BACK.value, Decision.CONTINUE.value, Decision.SUCCESS.value]: + if op_param.decision in [Decision.BACK.value, Decision.CONTINUE.value]: + self.useless_list.append(resource_id) + last_act = "NONE" + if op_param.decision == Decision.BACK.value: + action = EnvAction(action_type=EnvActionType.SYSTEM_BACK) + obs, _, _, _, info = env.step(action) + if info["res"] == ADB_EXEC_FAIL: + return AndroidActionOutput(action_state=RunState.FAIL) + doc = op_param.documentation + doc_path = docs_dir.joinpath(f"{resource_id}.txt") + if doc_path.exists(): + try: + doc_content = ast.literal_eval(doc_path.read_text()) + except Exception as exp: + logger.error(f"ast parse doc: {doc_path} failed, exp: {exp}") + return AndroidActionOutput(action_state=RunState.FAIL) + + if doc_content[self.act_name]: + logger.info(f"Documentation for the element {resource_id} already exists.") + return AndroidActionOutput(action_state=RunState.FAIL) + else: + doc_content = DocContent() + setattr(doc_content, self.act_name, doc) + doc_path.write_text(str(doc_content)) + return AndroidActionOutput(data={"last_act": last_act}) diff --git a/metagpt/ext/android_assistant/actions/self_learn_reflect_an.py b/metagpt/ext/android_assistant/actions/self_learn_reflect_an.py new file mode 100644 index 0000000000..305b7376af --- /dev/null +++ b/metagpt/ext/android_assistant/actions/self_learn_reflect_an.py @@ -0,0 +1,21 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the ActionNode to parse Reflection + +from metagpt.actions.action_node import ActionNode + +DECISION = ActionNode( + key="Decision", expected_type=str, instruction="explain why you made this decision", example="BACK" +) + + +THOUGHT = ActionNode(key="Thought", expected_type=str, instruction="explain why you made this decision", example="") + + +DOCUMENTATION = ActionNode( + key="Documentation", expected_type=str, instruction="describe the function of the UI element", example="" +) + + +NODES = [DECISION, THOUGHT, DOCUMENTATION] +SELF_LEARN_REFLECT_NODE = ActionNode.from_children("SelfLearnReflect", NODES) diff --git a/metagpt/ext/android_assistant/prompts/__init__.py b/metagpt/ext/android_assistant/prompts/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/android_assistant/prompts/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/prompts/assistant_prompt.py b/metagpt/ext/android_assistant/prompts/assistant_prompt.py new file mode 100644 index 0000000000..34baf58417 --- /dev/null +++ b/metagpt/ext/android_assistant/prompts/assistant_prompt.py @@ -0,0 +1,168 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the prompt templates of assistant learning and acting + +screenshot_parse_template = """You are an agent that is trained to perform some basic tasks on a smartphone. You will be given a +smartphone screenshot. The interactive UI elements on the screenshot are labeled with numeric tags starting from 1. The +numeric tag of each interactive element is located in the center of the element. + +You can call the following functions to control the smartphone: + +1. tap(element: int) +This function is used to tap an UI element shown on the smartphone screen. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. +A simple use case can be tap(5), which taps the UI element labeled with the number 5. + +2. text(text_input: str) +This function is used to insert text input in an input field/box. text_input is the string you want to insert and must +be wrapped with double quotation marks. A simple use case can be text("Hello, world!"), which inserts the string +"Hello, world!" into the input area on the smartphone screen. This function is usually callable when you see a keyboard +showing in the lower half of the screen. + +3. long_press(element: int) +This function is used to long press an UI element shown on the smartphone screen. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. +A simple use case can be long_press(5), which long presses the UI element labeled with the number 5. + +4. swipe(element: int, direction: str, dist: str) +This function is used to swipe an UI element shown on the smartphone screen, usually a scroll view or a slide bar. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. "direction" is a string that +represents one of the four directions: up, down, left, right. "direction" must be wrapped with double quotation +marks. "dist" determines the distance of the swipe and can be one of the three options: short, medium, long. You should +choose the appropriate distance option according to your need. +A simple use case can be swipe(21, "up", "medium"), which swipes up the UI element labeled with the number 21 for a +medium distance. + +5. grid() +You should call this function when you find the element you want to interact with is not labeled with a numeric tag and +other elements with numeric tags cannot help with the task. The function will bring up a grid overlay to divide the +smartphone screen into small areas and this will give you more freedom to choose any part of the screen to tap, long +press, or swipe. +{ui_document} +The task you need to complete is to: {task_description}. Your past actions to proceed with this task are summarized as +follows: {last_act} +Now, given the documentation and the following labeled screenshot, you need to think and call the function needed to +proceed with the task. Your output should include three parts in the given format: + +You can only take one action at a time, so please directly call the function.""" + +screenshot_parse_with_grid_template = """You are an agent that is trained to perform some basic tasks on a smartphone. You will be given +a smartphone screenshot overlaid by a grid. The grid divides the screenshot into small square areas. Each area is +labeled with an integer in the top-left corner. + +You can call the following functions to control the smartphone: + +1. tap(area: int, subarea: str) +This function is used to tap a grid area shown on the smartphone screen. "area" is the integer label assigned to a grid +area shown on the smartphone screen. "subarea" is a string representing the exact location to tap within the grid area. +It can take one of the nine values: center, top-left, top, top-right, left, right, bottom-left, bottom, and +bottom-right. +A simple use case can be tap(5, "center"), which taps the exact center of the grid area labeled with the number 5. + +2. long_press(area: int, subarea: str) +This function is used to long press a grid area shown on the smartphone screen. "area" is the integer label assigned to +a grid area shown on the smartphone screen. "subarea" is a string representing the exact location to long press within +the grid area. It can take one of the nine values: center, top-left, top, top-right, left, right, bottom-left, bottom, +and bottom-right. +A simple use case can be long_press(7, "top-left"), which long presses the top left part of the grid area labeled with +the number 7. + +3. swipe(start_area: int, start_subarea: str, end_area: int, end_subarea: str) +This function is used to perform a swipe action on the smartphone screen, especially when you want to interact with a +scroll view or a slide bar. "start_area" is the integer label assigned to the grid area which marks the starting +location of the swipe. "start_subarea" is a string representing the exact location to begin the swipe within the grid +area. "end_area" is the integer label assigned to the grid area which marks the ending location of the swipe. +"end_subarea" is a string representing the exact location to end the swipe within the grid area. +The two subarea parameters can take one of the nine values: center, top-left, top, top-right, left, right, bottom-left, +bottom, and bottom-right. +A simple use case can be swipe(21, "center", 25, "right"), which performs a swipe starting from the center of grid area +21 to the right part of grid area 25. + +The task you need to complete is to: {task_description}. Your past actions to proceed with this task are summarized as +follows: {last_act} +Now, given the following labeled screenshot, you need to think and call the function needed to proceed with the task. +Your output should include three parts in the given format: + +You can only take one action at a time, so please directly call the function.""" + +screenshot_parse_self_explore_template = """You are an agent that is trained to complete certain tasks on a smartphone. You will be +given a screenshot of a smartphone app. The interactive UI elements on the screenshot are labeled with numeric tags +starting from 1. + +You can call the following functions to interact with those labeled elements to control the smartphone: + +1. tap(element: int) +This function is used to tap an UI element shown on the smartphone screen. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. +A simple use case can be tap(5), which taps the UI element labeled with the number 5. + +2. text(text_input: str) +This function is used to insert text input in an input field/box. text_input is the string you want to insert and must +be wrapped with double quotation marks. A simple use case can be text("Hello, world!"), which inserts the string +"Hello, world!" into the input area on the smartphone screen. This function is only callable when you see a keyboard +showing in the lower half of the screen. + +3. long_press(element: int) +This function is used to long press an UI element shown on the smartphone screen. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. +A simple use case can be long_press(5), which long presses the UI element labeled with the number 5. + +4. swipe(element: int, direction: str, dist: str) +This function is used to swipe an UI element shown on the smartphone screen, usually a scroll view or a slide bar. +"element" is a numeric tag assigned to an UI element shown on the smartphone screen. "direction" is a string that +represents one of the four directions: up, down, left, right. "direction" must be wrapped with double quotation +marks. "dist" determines the distance of the swipe and can be one of the three options: short, medium, long. You should +choose the appropriate distance option according to your need. +A simple use case can be swipe(21, "up", "medium"), which swipes up the UI element labeled with the number 21 for a +medium distance. + +The task you need to complete is to {task_description}. Your past actions to proceed with this task are summarized as +follows: {last_act} +Now, given the following labeled screenshot, you need to think and call the function needed to proceed with the task. +Your output should include three parts in the given format: + +You can only take one action at a time, so please directly call the function.""" + +screenshot_parse_self_explore_reflect_template = """I will give you screenshots of a mobile app before and after {action} the UI +element labeled with the number '{ui_element}' on the first screenshot. The numeric tag of each element is located at +the center of the element. The action of {action} this UI element was described as follows: +{last_act} +The action was also an attempt to proceed with a larger task, which is to {task_desc}. Your job is to carefully analyze +the difference between the two screenshots to determine if the action is in accord with the description above and at +the same time effectively moved the task forward. Your output should be determined based on the following situations: +1. BACK +If you think the action navigated you to a page where you cannot proceed with the given task, you should go back to the +previous interface. At the same time, describe the functionality of the UI element concisely in one or two sentences by +observing the difference between the two screenshots. Notice that your description of the UI element should focus on +the general function. Never include the numeric tag of the UI element in your description. You can use pronouns such as +"the UI element" to refer to the element. Your output should be in the following format: +Decision: BACK +Thought: +Documentation: +2. INEFFECTIVE +If you find the action changed nothing on the screen (screenshots before and after the action are identical), you +should continue to interact with other elements on the screen. Notice that if you find the location of the cursor +changed between the two screenshots, then they are not identical. Your output should be in the following format: +Decision: INEFFECTIVE +Thought: +Documentation: +3. CONTINUE +If you find the action changed something on the screen but does not reflect the action description above and did not +move the given task forward, you should continue to interact with other elements on the screen. At the same time, +describe the functionality of the UI element concisely in one or two sentences by observing the difference between the +two screenshots. Notice that your description of the UI element should focus on the general function. Never include the +numeric tag of the UI element in your description. You can use pronouns such as "the UI element" to refer to the +element. Your output should be in the following format: +Decision: CONTINUE +Thought: +Documentation: +4. SUCCESS +If you think the action successfully moved the task forward (even though it did not completed the task), you should +describe the functionality of the UI element concisely in one or two sentences. Notice that your description of the UI +element should focus on the general function. Never include the numeric tag of the UI element in your description. You +can use pronouns such as "the UI element" to refer to the element. Your output should be in the following format: +Decision: SUCCESS +Thought: +Documentation: +""" diff --git a/metagpt/ext/android_assistant/prompts/operation_prompt.py b/metagpt/ext/android_assistant/prompts/operation_prompt.py new file mode 100644 index 0000000000..1bde53f041 --- /dev/null +++ b/metagpt/ext/android_assistant/prompts/operation_prompt.py @@ -0,0 +1,45 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : the prompt templates of phone operation + +tap_doc_template = """I will give you the screenshot of a mobile app before and after tapping the UI element labeled +with the number {ui_element} on the screen. The numeric tag of each element is located at the center of the element. +Tapping this UI element is a necessary part of proceeding with a larger task, which is to . Your task is to +describe the functionality of the UI element concisely in one or two sentences. Notice that your description of the UI +element should focus on the general function. For example, if the UI element is used to navigate to the chat window +with John, your description should not include the name of the specific person. Just say: "Tapping this area will +navigate the user to the chat window". Never include the numeric tag of the UI element in your description. You can use +pronouns such as "the UI element" to refer to the element.""" + +text_doc_template = """I will give you the screenshot of a mobile app before and after typing in the input area labeled +with the number {ui_element} on the screen. The numeric tag of each element is located at the center of the element. +Typing in this UI element is a necessary part of proceeding with a larger task, which is to . Your task is +to describe the functionality of the UI element concisely in one or two sentences. Notice that your description of the +UI element should focus on the general function. For example, if the change of the screenshot shows that the user typed +"How are you?" in the chat box, you do not need to mention the actual text. Just say: "This input area is used for the +user to type a message to send to the chat window.". Never include the numeric tag of the UI element in your +description. You can use pronouns such as "the UI element" to refer to the element.""" + +long_press_doc_template = """I will give you the screenshot of a mobile app before and after long pressing the UI +element labeled with the number {ui_element} on the screen. The numeric tag of each element is located at the center of +the element. Long pressing this UI element is a necessary part of proceeding with a larger task, which is to +. Your task is to describe the functionality of the UI element concisely in one or two sentences. Notice +that your description of the UI element should focus on the general function. For example, if long pressing the UI +element redirects the user to the chat window with John, your description should not include the name of the specific +person. Just say: "Long pressing this area will redirect the user to the chat window". Never include the numeric tag of +the UI element in your description. You can use pronouns such as "the UI element" to refer to the element.""" + +swipe_doc_template = """I will give you the screenshot of a mobile app before and after swiping the UI +element labeled with the number {ui_element} on the screen. The numeric tag of each element is located at the center of +the element. Swiping this UI element is a necessary part of proceeding with a larger task, which is to . +Your task is to describe the functionality of the UI element concisely in one or two sentences. Notice that your +description of the UI element should be as general as possible. For example, if swiping the UI element increases the +contrast ratio of an image of a building, your description should be just like this: "Swiping this area enables the +user to tune a specific parameter of the image". Never include the numeric tag of the UI element in your description. +You can use pronouns such as "the UI element" to refer to the element.""" + +refine_doc_suffix = """\nA documentation of this UI element generated from previous demos is shown below. Your +generated description should be based on this previous doc and optimize it. Notice that it is possible that your +understanding of the function of the UI element derived from the given screenshots conflicts with the previous doc, +because the function of a UI element can be flexible. In this case, your generated description should combine both. +Old documentation of this UI element: {old_doc}""" diff --git a/metagpt/ext/android_assistant/roles/__init__.py b/metagpt/ext/android_assistant/roles/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/android_assistant/roles/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/roles/android_assistant.py b/metagpt/ext/android_assistant/roles/android_assistant.py new file mode 100644 index 0000000000..45636f5191 --- /dev/null +++ b/metagpt/ext/android_assistant/roles/android_assistant.py @@ -0,0 +1,146 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : android assistant to learn from app operations and operate apps +import time +from datetime import datetime +from pathlib import Path +from typing import Optional + +from pydantic import Field + +from metagpt.actions.add_requirement import UserRequirement +from metagpt.config2 import config +from metagpt.const import EXAMPLE_PATH +from metagpt.ext.android_assistant.actions.manual_record import ManualRecord +from metagpt.ext.android_assistant.actions.parse_record import ParseRecord +from metagpt.ext.android_assistant.actions.screenshot_parse import ScreenshotParse +from metagpt.ext.android_assistant.actions.self_learn_and_reflect import ( + SelfLearnAndReflect, +) +from metagpt.ext.android_assistant.utils.schema import AndroidActionOutput, RunState +from metagpt.logs import logger +from metagpt.roles.role import Role, RoleReactMode +from metagpt.schema import Message + + +class AndroidAssistant(Role): + name: str = "Nick" + profile: str = "AndroidAssistant" + goal: str = "operate the mobile phone's apps with self-learn" + + task_desc: str = "" + round_count: int = 0 + last_act: str = "None" + output_root_dir: Optional[Path] = Field(default=None) + task_dir: Optional[Path] = Field(default=None) + docs_dir: Optional[Path] = Field(default=None) + grid_on: bool = Field(default=False) + + def __init__(self, **data): + super().__init__(**data) + + self._watch([UserRequirement, AndroidActionOutput]) + extra_config = config.extra + self.task_desc = extra_config.get("task_desc", "Just explore any app in this phone!") + app_name = extra_config.get("app_name", "demo") + data_dir = self.output_root_dir.absolute().joinpath("output") or EXAMPLE_PATH.joinpath( + "android_assistant/output" + ) + cur_datetime = datetime.fromtimestamp(int(time.time())).strftime("%Y-%m-%d_%H-%M-%S") + + """Firstly, we decide the state with user config, further, we can do it automatically, like if it's new app, + run the learn first and then do the act stage or learn it during the action. + """ + stage = extra_config.get("stage") + mode = extra_config.get("mode") + if stage == "learn" and mode == "manual": + # choose ManualRecord and then run ParseRecord + # Remember, only run each action only one time, no need to run n_round. + self.set_actions([ManualRecord, ParseRecord]) + self.task_dir = data_dir.joinpath(app_name, f"manual_learn_{cur_datetime}") + self.docs_dir = data_dir.joinpath(app_name, "manual_docs") + elif stage == "learn" and mode == "auto": + # choose SelfLearnAndReflect to run + self.set_actions([SelfLearnAndReflect]) + self.task_dir = data_dir.joinpath(app_name, f"auto_learn_{cur_datetime}") + self.docs_dir = data_dir.joinpath(app_name, "auto_docs") + elif stage == "act": + # choose ScreenshotParse to run + self.set_actions([ScreenshotParse]) + self.task_dir = data_dir.joinpath(app_name, f"act_{cur_datetime}") + if mode == "manual": + self.docs_dir = data_dir.joinpath(app_name, "manual_docs") + else: + self.docs_dir = data_dir.joinpath(app_name, "auto_docs") + else: + raise ValueError(f"invalid stage: {stage}, mode: {mode}") + + self._check_dir() + + self._set_react_mode(RoleReactMode.BY_ORDER) + + def _check_dir(self): + self.task_dir.mkdir(parents=True, exist_ok=True) + self.docs_dir.mkdir(parents=True, exist_ok=True) + + async def react(self) -> Message: + self.round_count += 1 + result = await super().react() + logger.debug(f"react result {result}") + return result + + async def _observe(self, ignore_memory=True) -> int: + """ignore old memory to make it run multi rounds inside a role""" + newest_msgs = self.rc.memory.get(k=1) + newest_msg = newest_msgs[0] if newest_msgs else None + if newest_msg and (RunState.SUCCESS.value.upper() not in newest_msg.content): + ignore_memory = False + state_val = newest_msg.content.split(".")[-1] # RoundCount: 1, action_state: RunState.SUCCESS + logger.warning(f"Latest action_state is {state_val}, will run in the remainder rounds without `react`") + return await super()._observe(ignore_memory) + + async def _act(self) -> Message: + logger.info(f"{self._setting}: to do {self.rc.todo}({self.rc.todo.name})") + todo = self.rc.todo + if isinstance(todo, ManualRecord): + resp = await todo.run(task_dir=self.task_dir, task_desc=self.task_desc, env=self.rc.env) + elif isinstance(todo, ParseRecord): + resp = await todo.run( + task_dir=self.task_dir, + docs_dir=self.docs_dir, + ) + elif isinstance(todo, SelfLearnAndReflect): + resp = await todo.run( + round_count=self.round_count, + task_desc=self.task_desc, + last_act=self.last_act, + task_dir=self.task_dir, + docs_dir=self.docs_dir, + env=self.rc.env, + ) + if resp.action_state == RunState.SUCCESS: + self.last_act = resp.data.get("last_act") + elif isinstance(todo, ScreenshotParse): + resp = await todo.run( + round_count=self.round_count, + task_desc=self.task_desc, + last_act=self.last_act, + task_dir=self.task_dir, + docs_dir=self.docs_dir, + grid_on=self.grid_on, + env=self.rc.env, + ) + if resp.action_state == RunState.SUCCESS: + logger.info(f"grid_on: {resp.data.get('grid_on')}") + self.grid_on = resp.data.get("grid_on", False) + self.last_act = resp.data.get("last_act", "None") + msg = Message( + content=f"RoundCount: {self.round_count}, action_state: {resp.action_state}", + role=self.profile, + cause_by=type(resp), + send_from=self.name, + send_to=self.name, + ) + + self.rc.memory.add(msg) + return msg diff --git a/metagpt/ext/android_assistant/utils/__init__.py b/metagpt/ext/android_assistant/utils/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/android_assistant/utils/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/android_assistant/utils/schema.py b/metagpt/ext/android_assistant/utils/schema.py new file mode 100644 index 0000000000..c066f98b62 --- /dev/null +++ b/metagpt/ext/android_assistant/utils/schema.py @@ -0,0 +1,158 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from enum import Enum + +from pydantic import BaseModel, Field, field_validator + + +class ActionOp(Enum): + TAP = "tap" + LONG_PRESS = "long_press" + TEXT = "text" + SWIPE = "swipe" + VERTICAL_SWIPE = "v_swipe" + HORIZONTAL_SWIPE = "h_swipe" + GRID = "grid" + STOP = "stop" + + +class SwipeOp(Enum): + UP = "up" + DOWN = "down" + LEFT = "left" + RIGHT = "right" + + +class Decision(Enum): + BACK = "BACK" + INEFFECTIVE = "INEFFECTIVE" + CONTINUE = "CONTINUE" + SUCCESS = "SUCCESS" + + @classmethod + def values(cls): + return [item.value for item in cls] + + +class AndroidElement(BaseModel): + """UI Element""" + + uid: str = Field(default="") + bbox: tuple[tuple[int, int], tuple[int, int]] = Field(default={}) + attrib: str = Field(default="") + + +class OpLogItem(BaseModel): + """log content for self-learn or task act""" + + step: int = Field(default=0) + prompt: str = Field(default="") + image: str = Field(default="") + response: str = Field(default="") + + +class ReflectLogItem(BaseModel): + """log content for self-learn-reflect""" + + step: int = Field(default=0) + prompt: str = Field(default="") + image_before: str = Field(default="") + image_after: str = Field(default="") + response: str = Field(default="") + + +class RecordLogItem(BaseModel): + """log content for record parse, same as ReflectLogItem""" + + step: int = Field(default=0) + prompt: str = Field(default="") + image_before: str = Field(default="") + image_after: str = Field(default="") + response: str = Field(default="") + + +class DocContent(BaseModel): + tap: str = Field(default="") + text: str = Field(default="") + v_swipe: str = Field(default="") + h_swipe: str = Field(default="") + long_press: str = Field(default="") + + +# start =================== define different Action Op and its params ============= +class RunState(Enum): + """run state""" + + SUCCESS = "success" + FINISH = "finish" + FAIL = "fail" + + +class BaseOpParam(BaseModel): + act_name: str = Field(default="", validate_default=True) + last_act: str = Field(default="None") + param_state: RunState = Field(default=RunState.SUCCESS, description="return state when extract params") + + +class TapOpParam(BaseOpParam): + area: int = Field(default=-1) + + +class TextOpParam(BaseOpParam): + input_str: str = Field(default="") + + +class LongPressOpParam(BaseOpParam): + area: int = Field(default=-1) + + +# Modify This SwipeOp to SwipeOpParam, Need better name +class SwipeOpParam(BaseOpParam): + area: int = Field(default=-1) + swipe_orient: str = Field(default="up") + dist: str = Field(default="") + + +class GridOpParam(BaseOpParam): + act_name: str = Field(default="") + + +class BaseGridOpParam(BaseOpParam): + @field_validator("act_name", mode="before") + @classmethod + def check_act_name(cls, act_name: str) -> str: + return f"{act_name}_grid" + + +class TapGridOpParam(BaseGridOpParam): + area: int = Field(default=-1) + subarea: str = Field(default="") + + +class LongPressGridOpParam(BaseGridOpParam): + area: int = Field(default=-1) + subarea: str = Field(default="") + + +class SwipeGridOpParam(BaseGridOpParam): + start_area: int = Field(default=-1) + start_subarea: str = Field(default="") + end_area: int = Field(default=-1) + end_subarea: str = Field(default="") + + +# end =================== define different Action Op and its params ============= + + +class ReflectOp(BaseModel): + decision: str = "" + thought: str = "" + documentation: str = "" + param_state: RunState = RunState.SUCCESS + + +class AndroidActionOutput(BaseModel): + data: dict = Field(default=dict()) + action_state: RunState = Field(default=RunState.SUCCESS) diff --git a/metagpt/ext/android_assistant/utils/utils.py b/metagpt/ext/android_assistant/utils/utils.py new file mode 100644 index 0000000000..f1fa138692 --- /dev/null +++ b/metagpt/ext/android_assistant/utils/utils.py @@ -0,0 +1,329 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +import re +from pathlib import Path +from typing import Union +from xml.etree.ElementTree import Element, iterparse + +import cv2 +import pyshine as ps + +from metagpt.config2 import config +from metagpt.ext.android_assistant.utils.schema import ( + ActionOp, + AndroidElement, + BaseGridOpParam, + BaseOpParam, + Decision, + GridOpParam, + LongPressGridOpParam, + LongPressOpParam, + ReflectOp, + RunState, + SwipeGridOpParam, + SwipeOpParam, + TapGridOpParam, + TapOpParam, + TextOpParam, +) +from metagpt.logs import logger + + +def get_id_from_element(elem: Element) -> str: + bounds = elem.attrib["bounds"][1:-1].split("][") + x1, y1 = map(int, bounds[0].split(",")) + x2, y2 = map(int, bounds[1].split(",")) + elem_w, elem_h = x2 - x1, y2 - y1 + if "resource-id" in elem.attrib and elem.attrib["resource-id"]: + elem_id = elem.attrib["resource-id"].replace(":", ".").replace("/", "_") + else: + elem_id = f"{elem.attrib['class']}_{elem_w}_{elem_h}" + if "content-desc" in elem.attrib and elem.attrib["content-desc"] and len(elem.attrib["content-desc"]) < 20: + content_desc = elem.attrib["content-desc"].replace("/", "_").replace(" ", "").replace(":", "_") + elem_id += f"_{content_desc}" + return elem_id + + +def traverse_xml_tree(xml_path: Path, elem_list: list[AndroidElement], attrib: str, add_index=False): + path = [] + extra_config = config.extra + for event, elem in iterparse(str(xml_path), ["start", "end"]): + if event == "start": + path.append(elem) + if attrib in elem.attrib and elem.attrib[attrib] == "true": + parent_prefix = "" + if len(path) > 1: + parent_prefix = get_id_from_element(path[-2]) + bounds = elem.attrib["bounds"][1:-1].split("][") + x1, y1 = map(int, bounds[0].split(",")) + x2, y2 = map(int, bounds[1].split(",")) + center = (x1 + x2) // 2, (y1 + y2) // 2 + elem_id = get_id_from_element(elem) + if parent_prefix: + elem_id = parent_prefix + "_" + elem_id + if add_index: + elem_id += f"_{elem.attrib['index']}" + close = False + for e in elem_list: + bbox = e.bbox + center_ = (bbox[0][0] + bbox[1][0]) // 2, (bbox[0][1] + bbox[1][1]) // 2 + dist = (abs(center[0] - center_[0]) ** 2 + abs(center[1] - center_[1]) ** 2) ** 0.5 + if dist <= extra_config.get("min_dist", 30): + close = True + break + if not close: + elem_list.append(AndroidElement(uid=elem_id, bbox=((x1, y1), (x2, y2)), attrib=attrib)) + + if event == "end": + path.pop() + + +def elem_list_from_xml_tree(xml_path: Path, useless_list: list[str], min_dist: int) -> list[AndroidElement]: + clickable_list = [] + focusable_list = [] + traverse_xml_tree(xml_path, clickable_list, "clickable", True) + traverse_xml_tree(xml_path, focusable_list, "focusable", True) + elem_list = [] + for elem in clickable_list: + if elem.uid in useless_list: + continue + elem_list.append(elem) + for elem in focusable_list: + if elem.uid in useless_list: + continue + bbox = elem.bbox + center = (bbox[0][0] + bbox[1][0]) // 2, (bbox[0][1] + bbox[1][1]) // 2 + close = False + for e in clickable_list: + bbox = e.bbox + center_ = (bbox[0][0] + bbox[1][0]) // 2, (bbox[0][1] + bbox[1][1]) // 2 + dist = (abs(center[0] - center_[0]) ** 2 + abs(center[1] - center_[1]) ** 2) ** 0.5 + if dist <= min_dist: + close = True + break + if not close: + elem_list.append(elem) + return elem_list + + +def draw_bbox_multi( + img_path: Path, + output_path: Path, + elem_list: list[AndroidElement], + record_mode: bool = False, + dark_mode: bool = False, +): + imgcv = cv2.imread(str(img_path)) + count = 1 + for elem in elem_list: + try: + top_left = elem.bbox[0] + bottom_right = elem.bbox[1] + left, top = top_left[0], top_left[1] + right, bottom = bottom_right[0], bottom_right[1] + label = str(count) + if record_mode: + if elem.attrib == "clickable": + color = (250, 0, 0) + elif elem.attrib == "focusable": + color = (0, 0, 250) + else: + color = (0, 250, 0) + imgcv = ps.putBText( + imgcv, + label, + text_offset_x=(left + right) // 2 + 10, + text_offset_y=(top + bottom) // 2 + 10, + vspace=10, + hspace=10, + font_scale=1, + thickness=2, + background_RGB=color, + text_RGB=(255, 250, 250), + alpha=0.5, + ) + else: + text_color = (10, 10, 10) if dark_mode else (255, 250, 250) + bg_color = (255, 250, 250) if dark_mode else (10, 10, 10) + imgcv = ps.putBText( + imgcv, + label, + text_offset_x=(left + right) // 2 + 10, + text_offset_y=(top + bottom) // 2 + 10, + vspace=10, + hspace=10, + font_scale=1, + thickness=2, + background_RGB=bg_color, + text_RGB=text_color, + alpha=0.5, + ) + except Exception as e: + logger.error(f"ERROR: An exception occurs while labeling the image\n{e}") + count += 1 + cv2.imwrite(str(output_path), imgcv) + return imgcv + + +def draw_grid(img_path: Path, output_path: Path) -> tuple[int, int]: + def get_unit_len(n): + for i in range(1, n + 1): + if n % i == 0 and 120 <= i <= 180: + return i + return -1 + + image = cv2.imread(str(img_path)) + height, width, _ = image.shape + color = (255, 116, 113) + unit_height = get_unit_len(height) + if unit_height < 0: + unit_height = 120 + unit_width = get_unit_len(width) + if unit_width < 0: + unit_width = 120 + thick = int(unit_width // 50) + rows = height // unit_height + cols = width // unit_width + for i in range(rows): + for j in range(cols): + label = i * cols + j + 1 + left = int(j * unit_width) + top = int(i * unit_height) + right = int((j + 1) * unit_width) + bottom = int((i + 1) * unit_height) + cv2.rectangle(image, (left, top), (right, bottom), color, thick // 2) + cv2.putText( + image, + str(label), + (left + int(unit_width * 0.05) + 3, top + int(unit_height * 0.3) + 3), + 0, + int(0.01 * unit_width), + (0, 0, 0), + thick, + ) + cv2.putText( + image, + str(label), + (left + int(unit_width * 0.05), top + int(unit_height * 0.3)), + 0, + int(0.01 * unit_width), + color, + thick, + ) + cv2.imwrite(str(output_path), image) + return rows, cols + + +def area_to_xy(area: int, subarea: str, width: int, height: int, rows: int, cols: int) -> tuple[int, int]: + area -= 1 + row, col = area // cols, area % cols + x_0, y_0 = col * (width // cols), row * (height // rows) + if subarea == "top-left": + x, y = x_0 + (width // cols) // 4, y_0 + (height // rows) // 4 + elif subarea == "top": + x, y = x_0 + (width // cols) // 2, y_0 + (height // rows) // 4 + elif subarea == "top-right": + x, y = x_0 + (width // cols) * 3 // 4, y_0 + (height // rows) // 4 + elif subarea == "left": + x, y = x_0 + (width // cols) // 4, y_0 + (height // rows) // 2 + elif subarea == "right": + x, y = x_0 + (width // cols) * 3 // 4, y_0 + (height // rows) // 2 + elif subarea == "bottom-left": + x, y = x_0 + (width // cols) // 4, y_0 + (height // rows) * 3 // 4 + elif subarea == "bottom": + x, y = x_0 + (width // cols) // 2, y_0 + (height // rows) * 3 // 4 + elif subarea == "bottom-right": + x, y = x_0 + (width // cols) * 3 // 4, y_0 + (height // rows) * 3 // 4 + else: + x, y = x_0 + (width // cols) // 2, y_0 + (height // rows) // 2 + return x, y + + +def elem_bbox_to_xy(bbox: tuple[tuple[int, int], tuple[int, int]]) -> tuple[int, int]: + tl, br = bbox + x, y = (tl[0] + br[0]) // 2, (tl[1] + br[1]) // 2 + return x, y + + +def reflect_parse_extarct(parsed_json: dict) -> ReflectOp: + decision = parsed_json.get("Decision") + if decision not in Decision.values(): + op = ReflectOp(param_state=RunState.FAIL) + else: + op = ReflectOp( + decision=parsed_json.get("Decision"), + thought=parsed_json.get("Thought"), + documentation=parsed_json.get("Documentation"), + ) + return op + + +def screenshot_parse_extract( + parsed_json: dict, grid_on: bool = False +) -> Union[BaseOpParam, BaseGridOpParam, GridOpParam]: + act = parsed_json.get("Action") + last_act = parsed_json.get("Summary") + act_name = act.split("(")[0] + + if RunState.FINISH.value.upper() in act: + return BaseOpParam(param_state=RunState.FINISH) + + if grid_on: + return screenshot_parse_extract_with_grid(act_name, act, last_act) + else: + return screenshot_parse_extract_without_grid(act_name, act, last_act) + + +def op_params_clean(params: list[str]) -> list[Union[int, str]]: + param_values = [] + for param_value in params: + if '"' in param_value or "'" in param_value: # remove `"` + param_values.append(param_value.strip()[1:-1]) + else: + param_values.append(int(param_value)) + return param_values + + +def screenshot_parse_extract_without_grid(act_name: str, act: str, last_act: str) -> Union[BaseOpParam, GridOpParam]: + if act_name == ActionOp.TAP.value: + area = int(re.findall(r"tap\((.*?)\)", act)[0]) + op = TapOpParam(act_name=act_name, area=area, last_act=last_act) + elif act_name == ActionOp.TEXT.value: + input_str = re.findall(r"text\((.*?)\)", act)[0][1:-1] + op = TextOpParam(act_name=act_name, input_str=input_str, last_act=last_act) + elif act_name == ActionOp.LONG_PRESS.value: + area = int(re.findall(r"long_press\((.*?)\)", act)[0]) + op = LongPressOpParam(act_name=act_name, area=area, last_act=last_act) + elif act_name == ActionOp.SWIPE.value: + params = re.findall(r"swipe\((.*?)\)", act)[0].split(",") + params = op_params_clean(params) # area, swipe_orient, dist + op = SwipeOpParam(act_name=act_name, area=params[0], swipe_orient=params[1], dist=params[2], last_act=last_act) + elif act_name == ActionOp.GRID.value: + op = GridOpParam(act_name=act_name) + else: + op = BaseOpParam(param_state=RunState.FAIL) + return op + + +def screenshot_parse_extract_with_grid(act_name: str, act: str, last_act: str) -> Union[BaseGridOpParam, GridOpParam]: + if act_name == ActionOp.TAP.value: + params = re.findall(r"tap\((.*?)\)", act)[0].split(",") + params = op_params_clean(params) + op = TapGridOpParam(act_name=act_name, area=params[0], subarea=params[1], last_act=last_act) + elif act_name == ActionOp.LONG_PRESS.value: + params = re.findall(r"long_press\((.*?)\)", act)[0].split(",") + params = op_params_clean(params) + op = LongPressGridOpParam(act_name=act_name, area=params[0], subarea=params[1], last_act=last_act) + elif act_name == ActionOp.SWIPE.value: + params = re.findall(r"swipe\((.*?)\)", act)[0].split(",") + params = op_params_clean(params) + op = SwipeGridOpParam( + act_name=act_name, start_area=params[0], start_subarea=params[1], end_area=params[2], end_subarea=params[3] + ) + elif act_name == ActionOp.GRID.value: + op = GridOpParam(act_name=act_name) + else: + op = BaseGridOpParam(param_state=RunState.FAIL) + return op diff --git a/metagpt/ext/stanford_town/README.md b/metagpt/ext/stanford_town/README.md new file mode 100644 index 0000000000..1bdcac145f --- /dev/null +++ b/metagpt/ext/stanford_town/README.md @@ -0,0 +1,51 @@ +## Stanford Town Game + +### Pre-Description +In order to facilitate GA( [generative_agents](https://github.com/joonspk-research/generative_agents) )'s frontend docking data (to avoid changing its code), you can set the value `temp_storage_path` to `temp_storage` of `generative_agents` when start `run_st_game.py`. like + +`python3 run_st_game.py --temp_storage_path path/to/ga/temp_storage xxx` + +Or change the path under `const.py` like beflow + +``` +STORAGE_PATH = EXAMPLE_PATH.joinpath("storage") +TEMP_STORAGE_PATH = EXAMPLE_PATH.joinpath("temp_storage") +# updated +STORAGE_PATH = Path("{path/to/ga/storage}") +TEMP_STORAGE_PATH = Path("{path/to/ga/temp_storage}") +``` + +This can be used to achieve docking of simulation data without changing the GA code. Otherwise, the GA code must be modified to adapt to the MG output path. + +If you don't want to start from 0, copy other simulation directories under `generative_agents/environment/frontend_server/storage/` to `examples/stanford_town/storage`, and select a directory named `fork_sim_code`. + +### Backend service startup +The execution entry is `python3 run_st_game.py "Host a open lunch party at 13:00 pm" "base_the_ville_isabella_maria_klaus" "test_sim" 10` +or +`python3 run_st_game.py "Host a open lunch party at 13:00 pm" "base_the_ville_isabella_maria_klaus" "test_sim" 10 --temp_storage_path path/to/ga/temp_storage` + +`idea` is the user's voice to the first Agent, and it is disseminated through this voice to see whether the final multi-agents achieve the goal of hosting or participating in the event. + +### Frontend service startup +Enter project folder `generative_agents` + +Enter `environment/frontend_server` and use `python3 manage.py runserver` to start the front-end service. +Visit `http://localhost:8000/simulator_home` to enter the current simulation interface. + +## Acknowledgements +The reproduction work has referred the [generative_agents](https://github.com/joonspk-research/generative_agents), let's make a general statement here. + +### Citation +```bib +@inproceedings{Park2023GenerativeAgents, +author = {Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S.}, +title = {Generative Agents: Interactive Simulacra of Human Behavior}, +year = {2023}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +booktitle = {In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23)}, +keywords = {Human-AI interaction, agents, generative AI, large language models}, +location = {San Francisco, CA, USA}, +series = {UIST '23} +} +``` \ No newline at end of file diff --git a/metagpt/ext/stanford_town/README_CN.md b/metagpt/ext/stanford_town/README_CN.md new file mode 100644 index 0000000000..3daf68d08f --- /dev/null +++ b/metagpt/ext/stanford_town/README_CN.md @@ -0,0 +1,50 @@ +## Stanford Town Game + +### 前置 +为了方便GA( [generative_agents](https://github.com/joonspk-research/generative_agents) )的前端对接数据(避免改动它那块的代码),可在启动`run_st_game.py`加上`temp_storage_path`指向`generative_agents`对应的`temp_storage`路径。比如 + +`python3 run_st_game.py --temp_storage_path path/to/ga/temp_storage xxx` + +或将`const.py`下的 + +``` +STORAGE_PATH = EXAMPLE_PATH.joinpath("storage") +TEMP_STORAGE_PATH = EXAMPLE_PATH.joinpath("temp_storage") +# 更新为 +STORAGE_PATH = Path("{path/to/ga/storage}") +TEMP_STORAGE_PATH = Path("{path/to/ga/temp_storage}") +``` +这样可用实现不改变GA代码情况下,实现仿真数据的对接。不然得修改GA的代码来适配MG的输出路径。 + +如果你不想从0开始启动,拷贝`generative_agents/environment/frontend_server/storage/`下的其他仿真目录到`examples/stanford_town/storage`,并选择一个目录名作为`fork_sim_code`。 + +### 后端服务启动 +执行入口为:`python3 run_st_game.py "Host a open lunch party at 13:00 pm" "base_the_ville_isabella_maria_klaus" "test_sim" 10` +或者 +`python3 run_st_game.py "Host a open lunch party at 13:00 pm" "base_the_ville_isabella_maria_klaus" "test_sim" 10 --temp_storage_path path/to/ga/temp_storage` + +`idea`为用户给第一个Agent的用户心声,并通过这个心声进行传播,看最后多智能体是否达到举办、参加活动的目标。 + +### 前端服务启动 +进入`generative_agents`项目目录 + +进入`environment/frontend_server`,使用`python3 manage.py runserver`启动前端服务。 +访问`http://localhost:8000/simulator_home` 进入当前的仿真界面。 + +## 致谢 +复现工作参考了 [generative_agents](https://github.com/joonspk-research/generative_agents), 感谢相关作者们。 + +### 引用 +```bib +@inproceedings{Park2023GenerativeAgents, +author = {Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S.}, +title = {Generative Agents: Interactive Simulacra of Human Behavior}, +year = {2023}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +booktitle = {In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23)}, +keywords = {Human-AI interaction, agents, generative AI, large language models}, +location = {San Francisco, CA, USA}, +series = {UIST '23} +} +``` diff --git a/metagpt/ext/stanford_town/__init__.py b/metagpt/ext/stanford_town/__init__.py new file mode 100644 index 0000000000..56ea35c9f7 --- /dev/null +++ b/metagpt/ext/stanford_town/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : stanford town implement diff --git a/metagpt/ext/stanford_town/actions/__init__.py b/metagpt/ext/stanford_town/actions/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/stanford_town/actions/agent_chat_sum_rel.py b/metagpt/ext/stanford_town/actions/agent_chat_sum_rel.py new file mode 100644 index 0000000000..98d370bb07 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/agent_chat_sum_rel.py @@ -0,0 +1,39 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : summarize relationship in a agent chat + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class AgentChatSumRel(STAction): + name: str = "AgentChatSumRel" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + resp = False + try: + _ = llm_resp.split('"')[0].strip() + resp = True + except Exception: + pass + return resp + + def _func_cleanup(self, llm_resp: str, prompt: str) -> str: + return llm_resp.split('"')[0].strip() + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, init_role: "STRole", target_role: "STRole", statements: str) -> str: + def create_prompt_input(init_role: "STRole", target_role: "STRole", statements: str) -> str: + prompt_input = [statements, init_role.name, target_role.name] + return prompt_input + + prompt_input = create_prompt_input(init_role, target_role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "summarize_chat_relationship_v2.txt") + + example_output = "Jane Doe is working on a project" + special_instruction = "The output should be a string that responds to the question." + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {init_role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/decide_to_talk.py b/metagpt/ext/stanford_town/actions/decide_to_talk.py new file mode 100644 index 0000000000..a393f31af7 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/decide_to_talk.py @@ -0,0 +1,97 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : device to talk to another role, return yes or no + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class DecideToTalk(STAction): + name: str = "DecideToTalk" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + resp = False + try: + if llm_resp.split("Answer in yes or no:")[-1].strip().lower() in ["yes", "no"]: + resp = True + except ValueError: + pass + return resp + + def _func_cleanup(self, llm_resp: str, prompt: str) -> str: + return llm_resp.split("Answer in yes or no:")[-1].strip().lower() + + def _func_fail_default_resp(self) -> str: + return "yes" + + async def run(self, init_role: "STRole", target_role: "STRole", retrieved: dict, *args, **kwargs) -> bool: + """Run action""" + + def create_prompt_input(init_role: "STRole", target_role: "STRole", retrieved: dict) -> str: + scratch = init_role.rc.scratch + target_scratch = target_role.rc.scratch + last_chat = init_role.rc.memory.get_last_chat(target_role.name) + last_chatted_time = "" + last_chat_about = "" + if last_chat: + last_chatted_time = last_chat.created.strftime("%B %d, %Y, %H:%M:%S") + last_chat_about = last_chat.description + + context = "" + for c_node in retrieved["events"]: + curr_desc = c_node.description.split(" ") + curr_desc[2:3] = ["was"] + curr_desc = " ".join(curr_desc) + context += f"{curr_desc}. " + context += "\n" + for c_node in retrieved["thoughts"]: + context += f"{c_node.description}. " + + curr_time = scratch.curr_time.strftime("%B %d, %Y, %H:%M:%S %p") + init_act_desc = scratch.act_description + if "(" in init_act_desc: + init_act_desc = init_act_desc.split("(")[-1][:-1] + + if len(scratch.planned_path) == 0 and "waiting" not in init_act_desc: + init_p_desc = f"{init_role.name} is already {init_act_desc}" + elif "waiting" in init_act_desc: + init_p_desc = f"{init_role.name} is {init_act_desc}" + else: + init_p_desc = f"{init_role.name} is on the way to {init_act_desc}" + + target_act_desc = scratch.act_description + if "(" in target_act_desc: + target_act_desc = target_act_desc.split("(")[-1][:-1] + + if len(target_scratch.planned_path) == 0 and "waiting" not in init_act_desc: + target_p_desc = f"{target_role.name} is already {target_act_desc}" + elif "waiting" in init_act_desc: + target_p_desc = f"{init_role.name} is {init_act_desc}" + else: + target_p_desc = f"{target_role.name} is on the way to {target_act_desc}" + + prompt_input = [] + prompt_input += [context] + + prompt_input += [curr_time] + + prompt_input += [init_role.name] + prompt_input += [target_role.name] + prompt_input += [last_chatted_time] + prompt_input += [last_chat_about] + + prompt_input += [init_p_desc] + prompt_input += [target_p_desc] + prompt_input += [init_role.name] + prompt_input += [target_role.name] + return prompt_input + + prompt_input = create_prompt_input(init_role, target_role, retrieved) + prompt = self.generate_prompt_with_tmpl_filename( + prompt_input=prompt_input, tmpl_filename="decide_to_talk_v2.txt" + ) + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=20) # yes or no + result = True if output == "yes" else False + logger.info(f"Role: {init_role.name} Action: {self.cls_name} output: {result}") + return result diff --git a/metagpt/ext/stanford_town/actions/dummy_action.py b/metagpt/ext/stanford_town/actions/dummy_action.py new file mode 100644 index 0000000000..a5004d5ef3 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/dummy_action.py @@ -0,0 +1,20 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : dummy action to make every STRole can deal DummyMessage which is caused by DummyAction + +from metagpt.actions import Action +from metagpt.schema import Message + + +class DummyAction(Action): + async def run(self, *args, **kwargs): + raise NotImplementedError + + +class DummyMessage(Message): + """ + dummy message to pass to role and make them to have a execution every round + """ + + content: str = "dummy" + cause_by: str = "DummyAction" diff --git a/metagpt/ext/stanford_town/actions/gen_action_details.py b/metagpt/ext/stanford_town/actions/gen_action_details.py new file mode 100644 index 0000000000..8e268a723a --- /dev/null +++ b/metagpt/ext/stanford_town/actions/gen_action_details.py @@ -0,0 +1,401 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : gen_action_details + +import random + +from metagpt.environment.stanford_town.env_space import EnvObsParams, EnvObsType +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class GenActionSector(STAction): + name: str = "GenActionSector" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cleaned_response = llm_resp.split("}")[0] + return cleaned_response + + def _func_validate(self, llm_resp: str, prompt: str): + if len(llm_resp.strip()) < 1: + return False + if "}" not in llm_resp: + return False + if "," in llm_resp: + return False + return True + + def _func_fail_default_resp(self): + fs = "kitchen" + return fs + + async def run(self, role: "STRole", access_tile: dict[str, str], act_desp: str): + def create_prompt_input(role, access_tile: dict[str, str], act_desp): + act_world = f"{access_tile['world']}" + + prompt_input = [] + + prompt_input += [role.scratch.get_str_name()] + prompt_input += [role.scratch.living_area.split(":")[1]] + x = f"{act_world}:{role.scratch.living_area.split(':')[1]}" + prompt_input += [role.s_mem.get_str_accessible_sector_arenas(x)] + + prompt_input += [role.scratch.get_str_name()] + prompt_input += [f"{access_tile['sector']}"] + x = f"{act_world}:{access_tile['sector']}" + prompt_input += [role.s_mem.get_str_accessible_sector_arenas(x)] + + if role.scratch.get_str_daily_plan_req() != "": + prompt_input += [f"\n{role.scratch.get_str_daily_plan_req()}"] + else: + prompt_input += [""] + + # MAR 11 TEMP + prompt_input = [] + act_world = access_tile["world"] + accessible_sector_str = role.s_mem.get_str_accessible_sectors(act_world) + curr = accessible_sector_str.split(", ") + fin_accessible_sectors = [] + for i in curr: + if "'s house" in i: + if role.scratch.last_name in i: + fin_accessible_sectors += [i] + else: + fin_accessible_sectors += [i] + accessible_sector_str = ", ".join(fin_accessible_sectors) + # END MAR 11 TEMP + + prompt_input += [accessible_sector_str] + + act_desp_1 = act_desp + act_desp_2 = act_desp + if "(" in act_desp: + act_desp_1 = act_desp.split("(")[0].strip() + act_desp_2 = act_desp.split("(")[-1][:-1] + prompt_input += [role.scratch.get_str_name()] + prompt_input += [act_desp_1] + + prompt_input += [act_desp_2] + prompt_input += [role.scratch.get_str_name()] + return prompt_input + + prompt_template = "action_location_sector_v1.txt" + prompt_input = create_prompt_input(role, access_tile, act_desp) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=15) + y = f"{access_tile['world']}" + x = [i.strip() for i in role.s_mem.get_str_accessible_sectors(y).split(",")] + if output not in x: + # output = random.choice(x) + output = role.scratch.living_area.split(":")[1] + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenActionArena(STAction): + name: str = "GenActionArena" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cleaned_response = llm_resp.split("}")[0] + return cleaned_response + + def _func_validate(self, llm_resp: str, prompt: str): + if len(llm_resp.strip()) < 1: + return False + if "}" not in llm_resp: + return False + if "," in llm_resp: + return False + return True + + def _func_fail_default_resp(self): + fs = "kitchen" + return fs + + async def run(self, role: "STRole", act_desp: str, act_world: str, act_sector: str): + def create_prompt_input(role, act_desp, act_world, act_sector): + prompt_input = [] + prompt_input += [role.scratch.get_str_name()] + x = f"{act_world}:{act_sector}" + prompt_input += [act_sector] + + # MAR 11 TEMP + accessible_arena_str = role.s_mem.get_str_accessible_sector_arenas(x) + curr = accessible_arena_str.split(", ") + fin_accessible_arenas = [] + for i in curr: + if "'s room" in i: + if role.scratch.last_name in i: + fin_accessible_arenas += [i] + else: + fin_accessible_arenas += [i] + accessible_arena_str = ", ".join(fin_accessible_arenas) + # END MAR 11 TEMP + prompt_input += [accessible_arena_str] + act_desp_1 = act_desp + act_desp_2 = act_desp + if "(" in act_desp: + act_desp_1 = act_desp.split("(")[0].strip() + act_desp_2 = act_desp.split("(")[-1][:-1] + prompt_input += [role.scratch.get_str_name()] + prompt_input += [act_desp_1] + + prompt_input += [act_desp_2] + prompt_input += [role.scratch.get_str_name()] + + prompt_input += [act_sector] + prompt_input += [accessible_arena_str] + return prompt_input + + prompt_template = "action_location_object_vMar11.txt" + prompt_input = create_prompt_input(role, act_desp, act_world, act_sector) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=15) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenActionObject(STAction): + name: str = "GenActionObject" + + def _func_validate(self, llm_resp: str, prompt: str): + if len(llm_resp.strip()) < 1: + return False + return True + + def _func_cleanup(self, llm_resp: str, prompt: str): + cleaned_response = llm_resp.strip() + return cleaned_response + + def _func_fail_default_resp(self): + fs = "bed" + return fs + + async def run(self, role: "STRole", act_desp: str, temp_address: str): + def create_prompt_input(role, act_desp, temp_address): + prompt_input = [] + if "(" in act_desp: + act_desp = act_desp.split("(")[-1][:-1] + + prompt_input += [act_desp] + prompt_input += [role.s_mem.get_str_accessible_arena_game_objects(temp_address)] + return prompt_input + + prompt_template = "action_object_v2.txt" + prompt_input = create_prompt_input(role, act_desp, temp_address) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=15) + x = [i.strip() for i in role.s_mem.get_str_accessible_arena_game_objects(temp_address).split(",")] + if output not in x: + output = random.choice(x) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenPronunciatio(STAction): + name: str = "GenPronunciatio" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cr = llm_resp.strip() + if len(cr) > 3: + cr = cr[:3] + return cr + + def _func_validate(self, llm_resp: str, prompt: str): + try: + self._func_cleanup(llm_resp, prompt="") + if len(llm_resp) == 0: + return False + except Exception: + return False + return True + + def _func_fail_default_resp(self): + fs = "😋" + return fs + + async def run(self, role: "STRole", act_desp: str): + def create_prompt_input(act_desp): + if "(" in act_desp: + act_desp = act_desp.split("(")[-1].split(")")[0] + prompt_input = [act_desp] + return prompt_input + + prompt_template = "generate_pronunciatio_v1.txt" + prompt_input = create_prompt_input(act_desp) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + example_output = "🛁🧖‍♀️" + special_instruction = "The value for the output must ONLY contain the emojis." + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenEventTriple(STAction): + name: str = "GenEventTriple" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cr = llm_resp.strip() + cr = [i.strip() for i in cr.split(")")[0].split(",")] + return cr + + def _func_validate(self, llm_resp: str, prompt: str): + try: + llm_resp = self._func_cleanup(llm_resp, prompt="") + if len(llm_resp) != 2: + return False + except Exception: + return False + return True + + def _func_fail_default_resp(self, role): + fs = (role.name, "is", "idle") + return fs + + async def run(self, role: "STRole", act_desp: str): + def create_prompt_input(role, act_desp): + if "(" in act_desp: + act_desp = act_desp.split("(")[-1].split(")")[0] + prompt_input = [role.name, act_desp, role.name] + return prompt_input + + prompt_template = "generate_event_triple_v1.txt" + prompt_input = create_prompt_input(role, act_desp) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp(role) + output = await self._run_gpt35_max_tokens(prompt, max_tokens=30) + output = (role.name, output[0], output[1]) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenActObjDescription(STAction): + name: str = "GenActObjDescription" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cr = llm_resp.strip() + if cr[-1] == ".": + cr = cr[:-1] + return cr + + def _func_validate(self, llm_resp: str, prompt: str): + try: + llm_resp = self._func_cleanup(llm_resp, prompt="") + except Exception: + return False + return True + + def _func_fail_default_resp(self, act_game_object): + fs = f"{act_game_object} is idle" + return fs + + async def run(self, role: "STRole", act_game_object: str, act_desp: str): + def create_prompt_input(act_game_object, act_desp, role): + prompt_input = [act_game_object, role.name, act_desp, act_game_object, act_game_object] + return prompt_input + + prompt_template = "generate_obj_event_v1.txt" + prompt_input = create_prompt_input(act_game_object, act_desp, role) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + example_output = "being fixed" + special_instruction = "The output should ONLY contain the phrase that should go in ." + self.fail_default_resp = self._func_fail_default_resp(act_game_object) + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenObjEventTriple(STAction): + name: str = "GenObjEventTriple" + + def _func_cleanup(self, llm_resp: str, prompt: str): + cr = llm_resp.strip() + cr = [i.strip() for i in cr.split(")")[0].split(",")] + return cr + + def _func_validate(self, llm_resp: str, prompt: str): + try: + llm_resp = self._func_cleanup(llm_resp, prompt="") + if len(llm_resp) != 2: + return False + except Exception: + return False + return True + + def _func_fail_default_resp(self, act_game_object: str): + fs = (act_game_object, "is", "idle") + return fs + + async def run(self, role: "STRole", act_game_object, act_obj_desp): + def create_prompt_input(act_game_object, act_obj_desp): + prompt_input = [act_game_object, act_obj_desp, act_game_object] + return prompt_input + + prompt_template = "generate_event_triple_v1.txt" + prompt_input = create_prompt_input(act_game_object, act_obj_desp) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp(act_game_object) + output = await self._run_gpt35_max_tokens(prompt, max_tokens=30) + output = (act_game_object, output[0], output[1]) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +class GenActionDetails(STAction): + name: str = "GenActionDetails" + + def _func_cleanup(self, llm_resp: str, prompt: str) -> list: + pass + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + # TODO -- this sometimes generates error + try: + self._func_cleanup(llm_resp) + except Exception: + return False + return True + + def _func_fail_default_resp(self): + fs = {} + return fs + + async def run(self, role: "STRole", act_desp: str, act_dura): + access_tile = role.rc.env.observe( + obs_params=EnvObsParams(obs_type=EnvObsType.GET_TITLE, coord=role.scratch.curr_tile) + ) + act_world = access_tile["world"] + act_sector = await GenActionSector().run(role, access_tile, act_desp) + act_arena = await GenActionArena().run(role, act_desp, act_world, act_sector) + act_address = f"{act_world}:{act_sector}:{act_arena}" + if not role.s_mem.get_str_accessible_arena_game_objects(act_address): + act_game_object = "" + else: + act_game_object = await GenActionObject().run(role, act_desp, act_address) + new_address = f"{act_world}:{act_sector}:{act_arena}:{act_game_object}" + act_pron = await GenPronunciatio().run(role, act_desp) + act_event = await GenEventTriple().run(role, act_desp) + # Persona's actions also influence the object states. We set those up here. + act_obj_desp = await GenActObjDescription().run(role, act_game_object, act_desp) + act_obj_pron = await GenPronunciatio().run(role, act_obj_desp) + act_obj_event = await GenObjEventTriple().run(role, act_game_object, act_obj_desp) + result_dict = { + "action_address": new_address, + "action_duration": int(act_dura), + "action_description": act_desp, + "action_pronunciatio": act_pron, + "action_event": act_event, + "chatting_with": None, + "chat": None, + "chatting_with_buffer": None, + "chatting_end_time": None, + "act_obj_description": act_obj_desp, + "act_obj_pronunciatio": act_obj_pron, + "act_obj_event": act_obj_event, + } + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {result_dict}") + return result_dict diff --git a/metagpt/ext/stanford_town/actions/gen_daily_schedule.py b/metagpt/ext/stanford_town/actions/gen_daily_schedule.py new file mode 100644 index 0000000000..5dffa89952 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/gen_daily_schedule.py @@ -0,0 +1,60 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : gen_daily_schedule + + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class GenDailySchedule(STAction): + name: str = "GenDailySchedule" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt="") + except Exception: + return False + return True + + def _func_cleanup(self, llm_resp: str, prompt: str) -> list: + cr = [] + _cr = llm_resp.split(")") + for i in _cr: + if i[-1].isdigit(): + i = i[:-1].strip() + if i[-1] == "." or i[-1] == ",": + cr += [i[:-1].strip()] + return cr + + def _func_fail_default_resp(self) -> int: + fs = [ + "wake up and complete the morning routine at 6:00 am", + "eat breakfast at 7:00 am", + "read a book from 8:00 am to 12:00 pm", + "have lunch at 12:00 pm", + "take a nap from 1:00 pm to 4:00 pm", + "relax and watch TV from 7:00 pm to 8:00 pm", + "go to bed at 11:00 pm", + ] + return fs + + async def run(self, role: "STRole", wake_up_hour: str): + def create_prompt_input(role, wake_up_hour): + prompt_input = [] + prompt_input += [role.scratch.get_str_iss()] + prompt_input += [role.scratch.get_str_lifestyle()] + prompt_input += [role.scratch.get_str_curr_date_str()] + prompt_input += [role.scratch.get_str_firstname()] + prompt_input += [f"{str(wake_up_hour)}:00 am"] + return prompt_input + + wake_up_hour = int(wake_up_hour) + prompt_template = "daily_planning_v6.txt" + prompt_input = create_prompt_input(role, wake_up_hour) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=500) + output = [f"wake up and complete the morning routine at {wake_up_hour}:00 am"] + output + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/gen_hourly_schedule.py b/metagpt/ext/stanford_town/actions/gen_hourly_schedule.py new file mode 100644 index 0000000000..5d59f96dda --- /dev/null +++ b/metagpt/ext/stanford_town/actions/gen_hourly_schedule.py @@ -0,0 +1,181 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : gen_hourly_schedule + +import random +import string + +from metagpt.logs import logger + +from .st_action import STAction + + +def get_random_alphanumeric(i=6, j=6): + """ + Returns a random alpha numeric strength that has the length of somewhere + between i and j. + + INPUT: + i: min_range for the length + j: max_range for the length + OUTPUT: + an alpha numeric str with the length of somewhere between i and j. + """ + k = random.randint(i, j) + x = "".join(random.choices(string.ascii_letters + string.digits, k=k)) + return x + + +class GenHourlySchedule(STAction): + name: str = "GenHourlySchedule" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt="") + except Exception: + return False + return True + + def _func_cleanup(self, llm_resp: str, prompt: str) -> list: + cr = llm_resp.strip() + if cr[-1] == ".": + cr = cr[:-1] + # to only use the first line of output + cr = cr.split("\n")[0] + return cr + + def _func_fail_default_resp(self) -> int: + fs = "asleep" + return fs + + async def _generate_schedule_for_given_hour( + self, role: "STRole", curr_hour_str, p_f_ds_hourly_org, hour_str, intermission2=None + ): + def create_prompt_input(persona, curr_hour_str, p_f_ds_hourly_org, hour_str, intermission2=None): + schedule_format = "" + for i in hour_str: + schedule_format += f"[{persona.scratch.get_str_curr_date_str()} -- {i}]" + schedule_format += " Activity: [Fill in]\n" + schedule_format = schedule_format[:-1] + + intermission_str = "Here the originally intended hourly breakdown of" + intermission_str += f" {persona.scratch.get_str_firstname()}'s schedule today: " + for count, i in enumerate(persona.scratch.daily_req): + intermission_str += f"{str(count + 1)}) {i}, " + intermission_str = intermission_str[:-2] + + prior_schedule = "" + if p_f_ds_hourly_org: + prior_schedule = "\n" + for count, i in enumerate(p_f_ds_hourly_org): + prior_schedule += f"[(ID:{get_random_alphanumeric()})" + prior_schedule += f" {persona.scratch.get_str_curr_date_str()} --" + prior_schedule += f" {hour_str[count]}] Activity:" + prior_schedule += f" {persona.scratch.get_str_firstname()}" + prior_schedule += f" is {i}\n" + + prompt_ending = f"[(ID:{get_random_alphanumeric()})" + prompt_ending += f" {persona.scratch.get_str_curr_date_str()}" + prompt_ending += f" -- {curr_hour_str}] Activity:" + prompt_ending += f" {persona.scratch.get_str_firstname()} is" + + if intermission2: + intermission2 = f"\n{intermission2}" + + prompt_input = [] + prompt_input += [schedule_format] + prompt_input += [persona.scratch.get_str_iss()] + + prompt_input += [prior_schedule + "\n"] + prompt_input += [intermission_str] + if intermission2: + prompt_input += [intermission2] + else: + prompt_input += [""] + prompt_input += [prompt_ending] + + return prompt_input + + prompt_template = "generate_hourly_schedule_v2.txt" + prompt_input = create_prompt_input(role, curr_hour_str, p_f_ds_hourly_org, hour_str, intermission2) + prompt_input_str = "\n".join(prompt_input) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, prompt_template) + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=50) + logger.info( + f"Role: {role.name} _generate_schedule_for_given_hour prompt_input: {prompt_input_str}, " + f"output: {output}" + ) + return output + + async def run(self, role: "STRole", wake_up_hour: int): + hour_str = [ + "00:00 AM", + "01:00 AM", + "02:00 AM", + "03:00 AM", + "04:00 AM", + "05:00 AM", + "06:00 AM", + "07:00 AM", + "08:00 AM", + "09:00 AM", + "10:00 AM", + "11:00 AM", + "12:00 PM", + "01:00 PM", + "02:00 PM", + "03:00 PM", + "04:00 PM", + "05:00 PM", + "06:00 PM", + "07:00 PM", + "08:00 PM", + "09:00 PM", + "10:00 PM", + "11:00 PM", + ] + n_m1_activity = [] + diversity_repeat_count = 1 # TODO mg 1->3 + for i in range(diversity_repeat_count): + logger.info(f"diversity_repeat_count idx: {i}") + n_m1_activity_set = set(n_m1_activity) + if len(n_m1_activity_set) < 5: + n_m1_activity = [] + for count, curr_hour_str in enumerate(hour_str): + if wake_up_hour > 0: + n_m1_activity += ["sleeping"] + wake_up_hour -= 1 + else: + logger.info(f"_generate_schedule_for_given_hour idx: {count}, n_m1_activity: {n_m1_activity}") + n_m1_activity += [ + await self._generate_schedule_for_given_hour(role, curr_hour_str, n_m1_activity, hour_str) + ] + + # Step 1. Compressing the hourly schedule to the following format: + # The integer indicates the number of hours. They should add up to 24. + # [['sleeping', 6], ['waking up and starting her morning routine', 1], + # ['eating breakfast', 1], ['getting ready for the day', 1], + # ['working on her painting', 2], ['taking a break', 1], + # ['having lunch', 1], ['working on her painting', 3], + # ['taking a break', 2], ['working on her painting', 2], + # ['relaxing and watching TV', 1], ['going to bed', 1], ['sleeping', 2]] + _n_m1_hourly_compressed = [] + prev = None + prev_count = 0 + for i in n_m1_activity: + if i != prev: + prev_count = 1 + _n_m1_hourly_compressed += [[i, prev_count]] + prev = i + elif _n_m1_hourly_compressed: + _n_m1_hourly_compressed[-1][1] += 1 + + # Step 2. Expand to min scale (from hour scale) + # [['sleeping', 360], ['waking up and starting her morning routine', 60], + # ['eating breakfast', 60],.. + n_m1_hourly_compressed = [] + for task, duration in _n_m1_hourly_compressed: + n_m1_hourly_compressed += [[task, duration * 60]] + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {n_m1_hourly_compressed}") + return n_m1_hourly_compressed diff --git a/metagpt/ext/stanford_town/actions/gen_iter_chat_utt.py b/metagpt/ext/stanford_town/actions/gen_iter_chat_utt.py new file mode 100644 index 0000000000..40f6d3af0e --- /dev/null +++ b/metagpt/ext/stanford_town/actions/gen_iter_chat_utt.py @@ -0,0 +1,125 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : generate_iterative_chat_utt + +from metagpt.environment.stanford_town.env_space import EnvObsParams, EnvObsType +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.ext.stanford_town.utils.utils import extract_first_json_dict +from metagpt.logs import logger + + +class GenIterChatUTT(STAction): + name: str = "GenIterChatUTT" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + resp = False + try: + _ = extract_first_json_dict(llm_resp) + resp = True + except Exception: + pass + return resp + + def _func_cleanup(self, llm_resp: str, prompt: str) -> dict: + gpt_response = extract_first_json_dict(llm_resp) + + cleaned_dict = dict() + cleaned = [] + for key, val in gpt_response.items(): + cleaned += [val] + cleaned_dict["utterance"] = cleaned[0] + cleaned_dict["end"] = True + if "f" in str(cleaned[1]) or "F" in str(cleaned[1]): + cleaned_dict["end"] = False + + return cleaned_dict + + def _func_fail_default_resp(self) -> dict: + cleaned_dict = dict() + cleaned_dict["utterance"] = "..." + cleaned_dict["end"] = False + return cleaned_dict + + async def run( + self, + init_role: "STRole", + target_role: "STRole", + retrieved: dict, + curr_context: str, + curr_chat: list[str], + *args, + **kwargs, + ) -> dict: + def create_prompt_input( + access_tile: dict[str, str], + init_role: "STRole", + target_role: "STRole", + retrieved: dict, + curr_context: str, + curr_chat: list[str], + ): + role = init_role + scratch = role.rc.scratch + target_scratch = target_role.rc.scratch + prev_convo_insert = "\n" + if role.rc.memory.chat_list: + for i in role.rc.memory.chat_list: + if i.object == target_role.name: + v1 = int((scratch.curr_time - i.created).total_seconds() / 60) + prev_convo_insert += ( + f"{str(v1)} minutes ago, {scratch.name} and " + f"{target_scratch.name} were already {i.description} " + f"This context takes place after that conversation." + ) + break + if prev_convo_insert == "\n": + prev_convo_insert = "" + if role.rc.memory.chat_list: + if int((scratch.curr_time - role.rc.memory.chat_list[-1].created).total_seconds() / 60) > 480: + prev_convo_insert = "" + logger.info(f"prev_convo_insert: {prev_convo_insert}") + + curr_sector = f"{access_tile['sector']}" + curr_arena = f"{access_tile['arena']}" + curr_location = f"{curr_arena} in {curr_sector}" + + retrieved_str = "" + for key, vals in retrieved.items(): + for v in vals: + retrieved_str += f"- {v.description}\n" + + convo_str = "" + for i in curr_chat: + convo_str += ": ".join(i) + "\n" + if convo_str == "": + convo_str = "[The conversation has not started yet -- start it!]" + + init_iss = f"Here is Here is a brief description of {scratch.name}.\n{scratch.get_str_iss()}" + prompt_input = [ + init_iss, + scratch.name, + retrieved_str, + prev_convo_insert, + curr_location, + curr_context, + scratch.name, + target_scratch.name, + convo_str, + scratch.name, + target_scratch.name, + scratch.name, + scratch.name, + scratch.name, + ] + return prompt_input + + access_tile = init_role.rc.env.observe( + obs_params=EnvObsParams(obs_type=EnvObsType.GET_TITLE, coord=init_role.scratch.curr_tile) + ) + prompt_input = create_prompt_input(access_tile, init_role, target_role, retrieved, curr_context, curr_chat) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "iterative_convo_v1.txt") + # original using `ChatGPT_safe_generate_response_OLD` + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_wo_extra_prompt(prompt) + logger.info(f"Role: {init_role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/inner_voice_action.py b/metagpt/ext/stanford_town/actions/inner_voice_action.py new file mode 100644 index 0000000000..83cfa037ba --- /dev/null +++ b/metagpt/ext/stanford_town/actions/inner_voice_action.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class AgentWhisperThoughtAction(STAction): + name: str = "AgentWhisperThoughtAction" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> list: + return llm_resp.split('"')[0].strip() + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, test_input=None, verbose=False) -> str: + def create_prompt_input(role: "STRole", statements, test_input=None): + prompt_input = [role.scratch.name, statements] + return prompt_input + + prompt_input = create_prompt_input(role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "whisper_inner_thought_v1.txt") + + output = await self._run_gpt35_max_tokens(prompt, max_tokens=50) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/new_decomp_schedule.py b/metagpt/ext/stanford_town/actions/new_decomp_schedule.py new file mode 100644 index 0000000000..759ec170f4 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/new_decomp_schedule.py @@ -0,0 +1,154 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : new_decomp_schedule + +import datetime + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class NewDecompSchedule(STAction): + name: str = "NewDecompSchedule" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + resp = False + try: + llm_resp = self._func_cleanup(llm_resp, prompt) + dur_sum = 0 + for act, dur in llm_resp: + dur_sum += dur + if isinstance(act, str): + return False + if isinstance(dur, int): + return False + x = prompt.split("\n")[0].split("originally planned schedule from")[-1].strip()[:-1] + x = [datetime.datetime.strptime(i.strip(), "%H:%M %p") for i in x.split(" to ")] + delta_min = int((x[1] - x[0]).total_seconds() / 60) + + if int(dur_sum) != int(delta_min): + return False + except Exception: + pass + return resp + + def _func_cleanup(self, llm_resp: str, prompt: str) -> list: + new_schedule = prompt + " " + llm_resp.strip() + new_schedule = new_schedule.split("The revised schedule:")[-1].strip() + new_schedule = new_schedule.split("\n") + + ret_temp = [] + for i in new_schedule: + ret_temp += [i.split(" -- ")] + + ret = [] + for time_str, action in ret_temp: + start_time = time_str.split(" ~ ")[0].strip() + end_time = time_str.split(" ~ ")[1].strip() + delta = datetime.datetime.strptime(end_time, "%H:%M") - datetime.datetime.strptime(start_time, "%H:%M") + delta_min = int(delta.total_seconds() / 60) + if delta_min < 0: + delta_min = 0 + ret += [[action, delta_min]] + + return ret + + def _func_fail_default_resp(self, main_act_dur: int, truncated_act_dur: int) -> int: + dur_sum = 0 + for act, dur in main_act_dur: + dur_sum += dur + + ret = truncated_act_dur[:] + ret += main_act_dur[len(ret) - 1 :] + + # If there are access, we need to trim... + ret_dur_sum = 0 + count = 0 + over = None + for act, dur in ret: + ret_dur_sum += dur + if ret_dur_sum == dur_sum: + break + if ret_dur_sum > dur_sum: + over = ret_dur_sum - dur_sum + break + count += 1 + + if over: + ret = ret[: count + 1] + ret[-1][1] -= over + + return ret + + async def run( + self, + role: "STRole", + main_act_dur: int, + truncated_act_dur: int, + start_time_hour: datetime, + end_time_hour: datetime, + inserted_act: str, + inserted_act_dur: int, + *args, + **kwargs, + ): + def create_prompt_input( + role: "STRole", + main_act_dur: int, + truncated_act_dur: int, + start_time_hour: datetime, + end_time_hour: datetime, + inserted_act: str, + inserted_act_dur: int, + ): + persona_name = role.name + start_hour_str = start_time_hour.strftime("%H:%M %p") + end_hour_str = end_time_hour.strftime("%H:%M %p") + + original_plan = "" + for_time = start_time_hour + for i in main_act_dur: + original_plan += ( + f'{for_time.strftime("%H:%M")} ~ ' + f'{(for_time + datetime.timedelta(minutes=int(i[1]))).strftime("%H:%M")} -- ' + i[0] + ) + original_plan += "\n" + for_time += datetime.timedelta(minutes=int(i[1])) + + new_plan_init = "" + for_time = start_time_hour + for count, i in enumerate(truncated_act_dur): + new_plan_init += ( + f'{for_time.strftime("%H:%M")} ~ ' + f'{(for_time + datetime.timedelta(minutes=int(i[1]))).strftime("%H:%M")} -- ' + i[0] + ) + new_plan_init += "\n" + if count < len(truncated_act_dur) - 1: + for_time += datetime.timedelta(minutes=int(i[1])) + + new_plan_init += (for_time + datetime.timedelta(minutes=int(i[1]))).strftime("%H:%M") + " ~" + + prompt_input = [ + persona_name, + start_hour_str, + end_hour_str, + original_plan, + persona_name, + inserted_act, + inserted_act_dur, + persona_name, + start_hour_str, + end_hour_str, + end_hour_str, + new_plan_init, + ] + return prompt_input + + prompt_input = create_prompt_input( + role, main_act_dur, truncated_act_dur, start_time_hour, end_time_hour, inserted_act, inserted_act_dur + ) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "new_decomp_schedule_v1.txt") + self.fail_default_resp = self._func_fail_default_resp(main_act_dur, truncated_act_dur) + output = await self._run_gpt35_max_tokens(prompt, max_tokens=1000) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/run_reflect_action.py b/metagpt/ext/stanford_town/actions/run_reflect_action.py new file mode 100644 index 0000000000..895f6828f0 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/run_reflect_action.py @@ -0,0 +1,277 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : Integration Reflect Action + +import re + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +# Run GPT Prompt Focal Point method +class AgentFocusPt(STAction): + name: str = "AgentFocusPt" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> str: + try: + """ + Cleanup handling has been completed for run_v2 + """ + return llm_resp + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, n: int, test_input=None) -> str: + def create_prompt_input(role: "STRole", statements, n, test_input=None): + prompt_input = [statements, str(n)] + return prompt_input + + prompt_input = create_prompt_input(role, statements, n) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "generate_focal_pt_v1.txt") + + example_output = '["What should Jane do for lunch", "Does Jane like strawberry", "Who is Jane"]' + special_instruction = "Output must be a list of str." + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Insight and Guidance +class AgentInsightAndGuidance(STAction): + name: str = "AgentInsightAndGuidance" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> dict: + try: + llm_resp = "1. " + llm_resp.strip() + ret = dict() + for i in llm_resp.split("\n"): + row = " ".join(i.split(". ")[1:]) + if "(because of " not in row: + continue + thought = row.split("(because of ")[0].strip() + if ")" not in row.split("(because of ")[1]: + continue + evi_raw = row.split("(because of ")[1].split(")")[0].strip() + evi_raw = re.findall(r"\d+", evi_raw) + evi_raw = [int(i.strip()) for i in evi_raw] + ret[thought] = evi_raw + return ret + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self, n: int) -> str: + return ["I am hungry"] * n + + async def run(self, role: "STRole", statements: str, n: int, test_input=None) -> dict: + def create_prompt_input(role, statements, n, test_input=None): + prompt_input = [statements, str(n)] + return prompt_input + + prompt_input = create_prompt_input(role, statements, n) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "insight_and_evidence_v1.txt") + + self.fail_default_resp = self._func_fail_default_resp(n) + output = await self._run_gpt35_max_tokens(prompt, max_tokens=150) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Event Triple +class AgentEventTriple(STAction): + name: str = "AgentEventTriple" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + llm_resp = self._func_cleanup(llm_resp, prompt="") + if len(llm_resp) != 2: + return False + except Exception: + return False + return True + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> list: + try: + cr = llm_resp.strip() + cr = [i.strip() for i in cr.split(")")[0].split(",")] + if len(cr) != 2: + return cr[-2:] + return cr + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, statements: str, role: "STRole", verbose=False) -> tuple: + def create_prompt_input(statements, role): + if "(" in statements: + statements = statements.split("(")[-1].split(")")[0] + prompt_input = [role.scratch.name, statements, role.scratch.name] + return prompt_input + + prompt_input = create_prompt_input(statements, role) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "generate_event_triple_v1.txt") + + output = await self._run_gpt35_max_tokens(prompt, max_tokens=30) + output = (role.scratch.name, output[0], output[1]) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Event Poignancy +class AgentEventPoignancy(STAction): + name: str = "AgentEventPoignancy" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> int: + try: + llm_resp = int(llm_resp.strip()) + return llm_resp + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, test_input=None, verbose=False) -> str: + def create_prompt_input(role: "STRole", statements: str, test_input=None): + prompt_input = [role.scratch.name, role.scratch.get_str_iss(), role.scratch.name, statements] + return prompt_input + + prompt_input = create_prompt_input(role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "poignancy_event_v1.txt") + + example_output = "5" # ######## + special_instruction = "The output should ONLY contain ONE integer value on the scale of 1 to 10." + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Chat Poignancy +class AgentChatPoignancy(STAction): + name: str = "AgentChatPoignancy" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> int: + try: + llm_resp = int(llm_resp.strip()) + return llm_resp + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, test_input=None, verbose=False) -> str: + def create_prompt_input(role: "STRole", statements, test_input=None): + prompt_input = [role.scratch.name, role.scratch.get_str_iss(), role.scratch.name, statements] + return prompt_input + + prompt_input = create_prompt_input(role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "poignancy_chat_v1.txt") + + example_output = "5" # ######## + special_instruction = "The output should ONLY contain ONE integer value on the scale of 1 to 10." + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Planning Thought on Convo +class AgentPlanThoughtOnConvo(STAction): + name: str = "AgentPlanThoughtOnConvo" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> str: + try: + return llm_resp.split('"')[0].strip() + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, test_input=None, verbose=False) -> str: + def create_prompt_input(role, statements, test_input=None): + prompt_input = [statements, role.scratch.name, role.scratch.name, role.scratch.name] + return prompt_input + + prompt_input = create_prompt_input(role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "planning_thought_on_convo_v1.txt") + + output = await self._run_gpt35_max_tokens(prompt, max_tokens=50) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output + + +# Run GPT Prompt Memory on Convo +class AgentMemoryOnConvo(STAction): + name: str = "AgentMemoryOnConvo" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + try: + self._func_cleanup(llm_resp, prompt) + return True + except Exception: + return False + + def _func_cleanup(self, llm_resp: str, prompt: str = "") -> str: + try: + return llm_resp.split('"')[0].strip() + except Exception as exp: + logger.error(f"{self.cls_name} with error {exp}") + + def _func_fail_default_resp(self) -> str: + pass + + async def run(self, role: "STRole", statements: str, test_input=None, verbose=False) -> str: + def create_prompt_input(role, statements, test_input=None): + prompt_input = [statements, role.scratch.name, role.scratch.name, role.scratch.name] + return prompt_input + + prompt_input = create_prompt_input(role, statements) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "memo_on_convo_v1.txt") + example_output = "Jane Doe was interesting to talk to." + special_instruction = ( + "The output should ONLY contain a string that summarizes anything interesting " + "that the agent may have noticed" + ) + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/st_action.py b/metagpt/ext/stanford_town/actions/st_action.py new file mode 100644 index 0000000000..321676374d --- /dev/null +++ b/metagpt/ext/stanford_town/actions/st_action.py @@ -0,0 +1,119 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : StanfordTown Action +import json +import time +from abc import abstractmethod +from pathlib import Path +from typing import Any, Optional, Union + +from metagpt.actions.action import Action +from metagpt.config2 import config +from metagpt.ext.stanford_town.utils.const import PROMPTS_DIR +from metagpt.logs import logger + + +class STAction(Action): + name: str = "STAction" + prompt_dir: Path = PROMPTS_DIR + fail_default_resp: Optional[str] = None + + @property + def cls_name(self): + return self.__class__.__name__ + + @abstractmethod + def _func_validate(self, llm_resp: str, prompt: str): + raise NotImplementedError + + @abstractmethod + def _func_cleanup(self, llm_resp: str, prompt: str): + raise NotImplementedError + + @abstractmethod + def _func_fail_default_resp(self): + raise NotImplementedError + + def generate_prompt_with_tmpl_filename(self, prompt_input: Union[str, list], tmpl_filename) -> str: + """ + same with `generate_prompt` + Args: + prompt_input: the input we want to feed in (IF THERE ARE MORE THAN ONE INPUT, THIS CAN BE A LIST.) + tmpl_filename: prompt template filename + Returns: + a str prompt that will be sent to LLM server. + """ + if isinstance(prompt_input, str): + prompt_input = [prompt_input] + prompt_input = [str(i) for i in prompt_input] + + f = open(str(self.prompt_dir.joinpath(tmpl_filename)), "r") + prompt = f.read() + f.close() + for count, i in enumerate(prompt_input): + prompt = prompt.replace(f"!!", i) + if "###" in prompt: + prompt = prompt.split("###")[1] + return prompt.strip() + + async def _aask(self, prompt: str) -> str: + return await self.llm.aask(prompt) + + async def _run_gpt35_max_tokens(self, prompt: str, max_tokens: int = 50, retry: int = 3): + for idx in range(retry): + try: + tmp_max_tokens_rsp = getattr(config.llm, "max_token", 1500) + setattr(config.llm, "max_token", max_tokens) + self.llm.use_system_prompt = False # to make it behave like a non-chat completions + + llm_resp = await self._aask(prompt) + + setattr(config.llm, "max_token", tmp_max_tokens_rsp) + logger.info(f"Action: {self.cls_name} llm _run_gpt35_max_tokens raw resp: {llm_resp}") + if self._func_validate(llm_resp, prompt): + return self._func_cleanup(llm_resp, prompt) + except Exception as exp: + logger.warning(f"Action: {self.cls_name} _run_gpt35_max_tokens exp: {exp}") + time.sleep(5) + return self.fail_default_resp + + async def _run_gpt35( + self, prompt: str, example_output: str, special_instruction: str, retry: int = 3 + ) -> Union[bool, Any]: + """same with `gpt_structure.ChatGPT_safe_generate_response`""" + prompt = '"""\n' + prompt + '\n"""\n' + prompt += f"Output the response to the prompt above in json. {special_instruction}\n" + prompt += "Example output json:\n" + prompt += '{"output": "' + str(example_output) + '"}' + + for idx in range(retry): + try: + llm_resp = await self._aask(prompt) + logger.info(f"Action: {self.cls_name} llm _run_gpt35 raw resp: {llm_resp}") + end_idx = llm_resp.strip().rfind("}") + 1 + llm_resp = llm_resp[:end_idx] + llm_resp = json.loads(llm_resp)["output"] + + if self._func_validate(llm_resp, prompt): + return self._func_cleanup(llm_resp, prompt) + except Exception as exp: + logger.warning(f"Action: {self.cls_name} _run_gpt35 exp: {exp}") + time.sleep(5) # usually avoid `Rate limit` + return False + + async def _run_gpt35_wo_extra_prompt(self, prompt: str, retry: int = 3) -> str: + for idx in range(retry): + try: + llm_resp = await self._aask(prompt) + llm_resp = llm_resp.strip() + logger.info(f"Action: {self.cls_name} llm _run_gpt35_wo_extra_prompt raw resp: {llm_resp}") + if self._func_validate(llm_resp, prompt): + return self._func_cleanup(llm_resp, prompt) + except Exception as exp: + logger.warning(f"Action: {self.cls_name} _run_gpt35_wo_extra_prompt exp: {exp}") + time.sleep(5) # usually avoid `Rate limit` + return self.fail_default_resp + + async def run(self, *args, **kwargs): + """Run action""" + raise NotImplementedError("The run method should be implemented in a subclass.") diff --git a/metagpt/ext/stanford_town/actions/summarize_conv.py b/metagpt/ext/stanford_town/actions/summarize_conv.py new file mode 100644 index 0000000000..5be5fcaa43 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/summarize_conv.py @@ -0,0 +1,47 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : summarize the content of agents' conversation + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class SummarizeConv(STAction): + name: str = "SummarizeConv" + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + resp = False + try: + _ = self._func_cleanup(llm_resp, prompt) + resp = True + except Exception: + pass + return resp + + def _func_cleanup(self, llm_resp: str, prompt: str) -> str: + ret = "conversing about " + llm_resp.strip() + return ret + + def _func_fail_default_resp(self) -> str: + return "conversing with a housemate about morning greetings" + + async def run(self, conv: list): + def create_prompt_input(conversation: list): + convo_str = "" + for row in conversation: + convo_str += f'{row[0]}: "{row[1]}"\n' + prompt_input = [convo_str] + return prompt_input + + prompt_input = create_prompt_input(conv) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "summarize_conversation_v1.txt") + + example_output = "conversing about what to eat for lunch" + special_instruction = ( + "The output must continue the sentence above by filling in the tag. " + "Don't start with 'this is a conversation about...' Just finish the sentence " + "but do not miss any important details (including who are chatting)." + ) + output = await self._run_gpt35(prompt, example_output, special_instruction) + logger.info(f"Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/task_decomp.py b/metagpt/ext/stanford_town/actions/task_decomp.py new file mode 100644 index 0000000000..3a23a73456 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/task_decomp.py @@ -0,0 +1,173 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : task_decomp + +import datetime + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class TaskDecomp(STAction): + name: str = "TaskDecomp" + + def _func_cleanup(self, llm_resp: str, prompt: str) -> list: + # TODO SOMETHING HERE sometimes fails... See screenshot + temp = [i.strip() for i in llm_resp.split("\n")] + _cr = [] + cr = [] + for count, i in enumerate(temp): + if count != 0: + _cr += [" ".join([j.strip() for j in i.split(" ")][3:])] + else: + _cr += [i] + for count, i in enumerate(_cr): + k = [j.strip() for j in i.split("(duration in minutes:")] + task = k[0] + if task[-1] == ".": + task = task[:-1] + duration = int(k[1].split(",")[0].strip()) + cr += [[task, duration]] + + total_expected_min = int(prompt.split("(total duration in minutes")[-1].split("):")[0].strip()) + + # TODO -- now, you need to make sure that this is the same as the sum of + # the current action sequence. + curr_min_slot = [ + ["dummy", -1], + ] # (task_name, task_index) + for count, i in enumerate(cr): + i_task = i[0] + i_duration = i[1] + + i_duration -= i_duration % 5 + if i_duration > 0: + for j in range(i_duration): + curr_min_slot += [(i_task, count)] + curr_min_slot = curr_min_slot[1:] + + if len(curr_min_slot) > total_expected_min: + last_task = curr_min_slot[60] + for i in range(1, 6): + curr_min_slot[-1 * i] = last_task + elif len(curr_min_slot) < total_expected_min: + last_task = curr_min_slot[-1] + for i in range(total_expected_min - len(curr_min_slot)): + curr_min_slot += [last_task] + + cr_ret = [ + ["dummy", -1], + ] + for task, task_index in curr_min_slot: + if task != cr_ret[-1][0]: + cr_ret += [[task, 1]] + else: + cr_ret[-1][1] += 1 + cr = cr_ret[1:] + + return cr + + def _func_validate(self, llm_resp: str, prompt: str) -> bool: + # TODO -- this sometimes generates error + try: + self._func_cleanup(llm_resp, prompt) + except Exception: + return False + return True + + def _func_fail_default_resp(self) -> int: + fs = [["asleep", 0]] + return fs + + async def run(self, role: "STRole", task_desc: int, truncated_act_dur: int, *args, **kwargs): + def create_prompt_input(role, task, duration): + """ + Today is Saturday June 25. From 00:00 ~ 06:00am, Maeve is + planning on sleeping, 06:00 ~ 07:00am, Maeve is + planning on waking up and doing her morning routine, + and from 07:00am ~08:00am, Maeve is planning on having breakfast. + """ + + curr_f_org_index = role.scratch.get_f_daily_schedule_hourly_org_index() + all_indices = [] + # if curr_f_org_index > 0: + # all_indices += [curr_f_org_index-1] + all_indices += [curr_f_org_index] + if curr_f_org_index + 1 <= len(role.scratch.f_daily_schedule_hourly_org): + all_indices += [curr_f_org_index + 1] + if curr_f_org_index + 2 <= len(role.scratch.f_daily_schedule_hourly_org): + all_indices += [curr_f_org_index + 2] + + curr_time_range = "" + + logger.debug("DEBUG") + logger.debug(role.scratch.f_daily_schedule_hourly_org) + logger.debug(all_indices) + + summ_str = f'Today is {role.scratch.curr_time.strftime("%B %d, %Y")}. ' + summ_str += "From " + for index in all_indices: + logger.debug(f"index {index}") + if index < len(role.scratch.f_daily_schedule_hourly_org): + start_min = 0 + for i in range(index): + start_min += role.scratch.f_daily_schedule_hourly_org[i][1] + end_min = start_min + role.scratch.f_daily_schedule_hourly_org[index][1] + start_time = datetime.datetime.strptime("00:00:00", "%H:%M:%S") + datetime.timedelta( + minutes=start_min + ) + end_time = datetime.datetime.strptime("00:00:00", "%H:%M:%S") + datetime.timedelta( + minutes=end_min + ) + start_time_str = start_time.strftime("%H:%M%p") + end_time_str = end_time.strftime("%H:%M%p") + summ_str += ( + f"{start_time_str} ~ {end_time_str}, {role.name} is planning " + f"on {role.scratch.f_daily_schedule_hourly_org[index][0]}, " + ) + if curr_f_org_index + 1 == index: + curr_time_range = f"{start_time_str} ~ {end_time_str}" + summ_str = summ_str[:-2] + "." + + prompt_input = [] + prompt_input += [role.scratch.get_str_iss()] + prompt_input += [summ_str] + # prompt_input += [role.scratch.get_str_curr_date_str()] + prompt_input += [role.scratch.get_str_firstname()] + prompt_input += [role.scratch.get_str_firstname()] + prompt_input += [task] + prompt_input += [curr_time_range] + prompt_input += [duration] + prompt_input += [role.scratch.get_str_firstname()] + return prompt_input + + prompt_input = create_prompt_input(role, task_desc, truncated_act_dur) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "task_decomp_v3.txt") + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=1000) + logger.info(f"Role: {role.name} {self.cls_name} output: {output}") + + fin_output = [] + time_sum = 0 + for i_task, i_duration in output: + time_sum += i_duration + # HM????????? + # if time_sum < duration: + if time_sum <= truncated_act_dur: + fin_output += [[i_task, i_duration]] + else: + break + ftime_sum = 0 + for fi_task, fi_duration in fin_output: + ftime_sum += fi_duration + + fin_output[-1][1] += truncated_act_dur - ftime_sum + output = fin_output + + task_decomp = output + ret = [] + for decomp_task, duration in task_decomp: + ret += [[f"{task_desc} ({decomp_task})", duration]] + output = ret + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/actions/wake_up.py b/metagpt/ext/stanford_town/actions/wake_up.py new file mode 100644 index 0000000000..ea44cd3a42 --- /dev/null +++ b/metagpt/ext/stanford_town/actions/wake_up.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : wake_up + + +from metagpt.ext.stanford_town.actions.st_action import STAction +from metagpt.logs import logger + + +class WakeUp(STAction): + name: str = "WakeUp" + + def _func_validate(self, llm_resp: str, prompt: str = None) -> bool: + try: + self._func_cleanup(llm_resp, prompt="") + except Exception: + return False + return True + + def _func_cleanup(self, llm_resp: str, prompt: str) -> int: + cr = int(llm_resp.strip().lower().split("am")[0]) + return cr + + def _func_fail_default_resp(self) -> int: + fs = 8 + return fs + + async def run(self, role: "STRole"): + def create_prompt_input(role): + prompt_input = [ + role.scratch.get_str_iss(), + role.scratch.get_str_lifestyle(), + role.scratch.get_str_firstname(), + ] + return prompt_input + + prompt_input = create_prompt_input(role) + prompt = self.generate_prompt_with_tmpl_filename(prompt_input, "wake_up_hour_v1.txt") + self.fail_default_resp = self._func_fail_default_resp() + output = await self._run_gpt35_max_tokens(prompt, max_tokens=5) + logger.info(f"Role: {role.name} Action: {self.cls_name} output: {output}") + return output diff --git a/metagpt/ext/stanford_town/memory/__init__.py b/metagpt/ext/stanford_town/memory/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/metagpt/ext/stanford_town/memory/agent_memory.py b/metagpt/ext/stanford_town/memory/agent_memory.py new file mode 100644 index 0000000000..d212232f42 --- /dev/null +++ b/metagpt/ext/stanford_town/memory/agent_memory.py @@ -0,0 +1,378 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : BasicMemory,AgentMemory实现 + +from datetime import datetime +from pathlib import Path +from typing import Optional + +from pydantic import Field, field_serializer, model_validator + +from metagpt.logs import logger +from metagpt.memory.memory import Memory +from metagpt.schema import Message +from metagpt.utils.common import read_json_file, write_json_file + + +class BasicMemory(Message): + """ + BasicMemory继承于MG的Message类,其中content属性替代description属性 + Message类中对于Chat类型支持的非常好,对于Agent个体的Perceive,Reflection,Plan支持的并不多 + 在Type设计上,我们延续GA的三个种类,但是对于Chat种类的对话进行特别设计(具体怎么设计还没想好) + """ + + memory_id: Optional[str] = Field(default=None) # 记忆ID + memory_count: int = -1 # 第几个记忆,实际数值与Memory相等 + type_count: int = -1 # 第几种记忆,类型为整数 + memory_type: Optional[str] = Field(default=None) # 记忆类型,包含 event,thought,chat三种类型 + depth: int = -1 # 记忆深度,类型为整数 + created: Optional[datetime] = Field(default=None) # 创建时间 + expiration: Optional[datetime] = Field(default=None) # 记忆失效时间,默认为空() + last_accessed: Optional[datetime] = Field(default=None) # 上一次调用的时间,初始化时候与self.created一致 + subject: Optional[str] = Field(default=None) # 主语 + predicate: Optional[str] = Field(default=None) # 谓语 + object: Optional[str] = Field(default=None) # 宾语 + + description: Optional[str] = Field(default=None) + embedding_key: Optional[str] = Field(default=None) # 内容与self.content一致 + poignancy: int = -1 # importance值 + keywords: list[str] = Field(default=[]) # keywords + filling: list = Field(default=[]) # 装的与之相关联的memory_id的列表 + + __hash__ = object.__hash__ # support hash in AgentMemory + + @model_validator(mode="before") + @classmethod + def check_values(cls, values): + if "created" in values: + values["last_accessed"] = values["created"] + if "content" in values: + values["description"] = values["content"] + if "filling" in values: + values["filling"] = values["filling"] or [] + return values + + @field_serializer("created", "expiration") + def transform_time_field(self, time_field: Optional[datetime]) -> str: + if time_field: + time_field = time_field.strftime("%Y-%m-%d %H:%M:%S") + return time_field + + def summary(self): + return self.subject, self.predicate, self.object + + def save_to_dict(self) -> dict: + """ + 将MemoryBasic类转化为字典,用于存储json文件 + 这里需要注意,cause_by跟GA不兼容,所以需要做一个格式转换 + """ + memory_dict = dict() + node_id = self.memory_id + basic_mem_obj = self.model_dump( + include=[ + "node_count", + "type_count", + "type", + "depth", + "created", + "expiration", + "subject", + "predicate", + "object", + "description", + "embedding_key", + "poignancy", + "keywords", + "filling", + "cause_by", + ] + ) + + memory_dict[node_id] = basic_mem_obj + return memory_dict + + +class AgentMemory(Memory): + """ + GA中主要存储三种JSON + 1. embedding.json (Dict embedding_key:embedding) + 2. Node.json (Dict Node_id:Node) + 3. kw_strength.json + """ + + storage: list[BasicMemory] = [] # 重写Storage,存储BasicMemory所有节点 + event_list: list[BasicMemory] = [] # 存储event记忆 + thought_list: list[BasicMemory] = [] # 存储thought记忆 + chat_list: list[BasicMemory] = [] # chat-related memory + + event_keywords: dict[str, list[BasicMemory]] = dict() # 存储keywords + thought_keywords: dict[str, list[BasicMemory]] = dict() + chat_keywords: dict[str, list[BasicMemory]] = dict() + + kw_strength_event: dict[str, int] = dict() + kw_strength_thought: dict[str, int] = dict() + + memory_saved: Optional[Path] = Field(default=None) + embeddings: dict[str, list[float]] = dict() + + def set_mem_path(self, memory_saved: Path): + self.memory_saved = memory_saved + self.load(memory_saved) + + def save(self, memory_saved: Path): + """ + 将MemoryBasic类存储为Nodes.json形式。复现GA中的Kw Strength.json形式 + 这里添加一个路径即可 + TODO 这里在存储时候进行倒序存储,之后需要验证(test_memory通过) + """ + memory_json = dict() + for i in range(len(self.storage)): + memory_node = self.storage[len(self.storage) - i - 1] + memory_node = memory_node.save_to_dict() + memory_json.update(memory_node) + write_json_file(memory_saved.joinpath("nodes.json"), memory_json) + write_json_file(memory_saved.joinpath("embeddings.json"), self.embeddings) + + strength_json = dict() + strength_json["kw_strength_event"] = self.kw_strength_event + strength_json["kw_strength_thought"] = self.kw_strength_thought + write_json_file(memory_saved.joinpath("kw_strength.json"), strength_json) + + def load(self, memory_saved: Path): + """ + 将GA的JSON解析,填充到AgentMemory类之中 + """ + self.embeddings = read_json_file(memory_saved.joinpath("embeddings.json")) + memory_load = read_json_file(memory_saved.joinpath("nodes.json")) + for count in range(len(memory_load.keys())): + node_id = f"node_{str(count + 1)}" + node_details = memory_load[node_id] + node_type = node_details["type"] + created = datetime.strptime(node_details["created"], "%Y-%m-%d %H:%M:%S") + expiration = None + if node_details["expiration"]: + expiration = datetime.strptime(node_details["expiration"], "%Y-%m-%d %H:%M:%S") + + s = node_details["subject"] + p = node_details["predicate"] + o = node_details["object"] + + description = node_details["description"] + embedding_pair = (node_details["embedding_key"], self.embeddings[node_details["embedding_key"]]) + poignancy = node_details["poignancy"] + keywords = set(node_details["keywords"]) + filling = node_details["filling"] + if node_type == "thought": + self.add_thought( + created, expiration, s, p, o, description, keywords, poignancy, embedding_pair, filling + ) + if node_type == "event": + self.add_event(created, expiration, s, p, o, description, keywords, poignancy, embedding_pair, filling) + if node_type == "chat": + self.add_chat(created, expiration, s, p, o, description, keywords, poignancy, embedding_pair, filling) + + strength_keywords_load = read_json_file(memory_saved.joinpath("kw_strength.json")) + if strength_keywords_load["kw_strength_event"]: + self.kw_strength_event = strength_keywords_load["kw_strength_event"] + if strength_keywords_load["kw_strength_thought"]: + self.kw_strength_thought = strength_keywords_load["kw_strength_thought"] + + def add(self, memory_basic: BasicMemory): + """ + Add a new message to storage, while updating the index + 重写add方法,修改原有的Message类为BasicMemory类,并添加不同的记忆类型添加方式 + """ + if memory_basic.memory_id in self.storage: + return + self.storage.append(memory_basic) + if memory_basic.memory_type == "chat": + self.chat_list[0:0] = [memory_basic] + return + if memory_basic.memory_type == "thought": + self.thought_list[0:0] = [memory_basic] + return + if memory_basic.memory_type == "event": + self.event_list[0:0] = [memory_basic] + return + + def add_chat( + self, created, expiration, s, p, o, content, keywords, poignancy, embedding_pair, filling, cause_by="" + ): + """ + 调用add方法,初始化chat,在创建的时候就需要调用embedding函数 + """ + memory_count = len(self.storage) + 1 + type_count = len(self.thought_list) + 1 + memory_type = "chat" + memory_id = f"node_{str(memory_count)}" + depth = 1 + + memory_node = BasicMemory( + memory_id=memory_id, + memory_count=memory_count, + type_count=type_count, + memory_type=memory_type, + depth=depth, + created=created, + expiration=expiration, + subject=s, + predicate=p, + object=o, + description=content, + embedding_key=embedding_pair[0], + poignancy=poignancy, + keywords=keywords, + filling=filling, + cause_by=cause_by, + ) + + keywords = [i.lower() for i in keywords] + for kw in keywords: + if kw in self.chat_keywords: + self.chat_keywords[kw][0:0] = [memory_node] + else: + self.chat_keywords[kw] = [memory_node] + + self.add(memory_node) + + self.embeddings[embedding_pair[0]] = embedding_pair[1] + return memory_node + + def add_thought(self, created, expiration, s, p, o, content, keywords, poignancy, embedding_pair, filling): + """ + 调用add方法,初始化thought + """ + memory_count = len(self.storage) + 1 + type_count = len(self.thought_list) + 1 + memory_type = "thought" + memory_id = f"node_{str(memory_count)}" + depth = 1 + + try: + if filling: + depth_list = [memory_node.depth for memory_node in self.storage if memory_node.memory_id in filling] + depth += max(depth_list) + except Exception as exp: + logger.warning(f"filling init occur {exp}") + pass + + memory_node = BasicMemory( + memory_id=memory_id, + memory_count=memory_count, + type_count=type_count, + memory_type=memory_type, + depth=depth, + created=created, + expiration=expiration, + subject=s, + predicate=p, + object=o, + description=content, + embedding_key=embedding_pair[0], + poignancy=poignancy, + keywords=keywords, + filling=filling, + ) + + keywords = [i.lower() for i in keywords] + for kw in keywords: + if kw in self.thought_keywords: + self.thought_keywords[kw][0:0] = [memory_node] + else: + self.thought_keywords[kw] = [memory_node] + + self.add(memory_node) + + if f"{p} {o}" != "is idle": + for kw in keywords: + if kw in self.kw_strength_thought: + self.kw_strength_thought[kw] += 1 + else: + self.kw_strength_thought[kw] = 1 + + self.embeddings[embedding_pair[0]] = embedding_pair[1] + return memory_node + + def add_event(self, created, expiration, s, p, o, content, keywords, poignancy, embedding_pair, filling): + """ + 调用add方法,初始化event + """ + memory_count = len(self.storage) + 1 + type_count = len(self.event_list) + 1 + memory_type = "event" + memory_id = f"node_{str(memory_count)}" + depth = 0 + + if "(" in content: + content = " ".join(content.split()[:3]) + " " + content.split("(")[-1][:-1] + + memory_node = BasicMemory( + memory_id=memory_id, + memory_count=memory_count, + type_count=type_count, + memory_type=memory_type, + depth=depth, + created=created, + expiration=expiration, + subject=s, + predicate=p, + object=o, + description=content, + embedding_key=embedding_pair[0], + poignancy=poignancy, + keywords=keywords, + filling=filling, + ) + + keywords = [i.lower() for i in keywords] + for kw in keywords: + if kw in self.event_keywords: + self.event_keywords[kw][0:0] = [memory_node] + else: + self.event_keywords[kw] = [memory_node] + + self.add(memory_node) + + if f"{p} {o}" != "is idle": + for kw in keywords: + if kw in self.kw_strength_event: + self.kw_strength_event[kw] += 1 + else: + self.kw_strength_event[kw] = 1 + + self.embeddings[embedding_pair[0]] = embedding_pair[1] + return memory_node + + def get_summarized_latest_events(self, retention): + ret_set = set() + for e_node in self.event_list[:retention]: + ret_set.add(e_node.summary()) + return ret_set + + def get_last_chat(self, target_role_name: str): + if target_role_name.lower() in self.chat_keywords: + return self.chat_keywords[target_role_name.lower()][0] + else: + return False + + def retrieve_relevant_thoughts(self, s_content: str, p_content: str, o_content: str) -> set: + contents = [s_content, p_content, o_content] + + ret = [] + for i in contents: + if i in self.thought_keywords: + ret += self.thought_keywords[i.lower()] + + ret = set(ret) + return ret + + def retrieve_relevant_events(self, s_content: str, p_content: str, o_content: str) -> set: + contents = [s_content, p_content, o_content] + + ret = [] + for i in contents: + if i in self.event_keywords: + ret += self.event_keywords[i] + + ret = set(ret) + return ret diff --git a/metagpt/ext/stanford_town/memory/retrieve.py b/metagpt/ext/stanford_town/memory/retrieve.py new file mode 100644 index 0000000000..c4b32f9650 --- /dev/null +++ b/metagpt/ext/stanford_town/memory/retrieve.py @@ -0,0 +1,180 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : Retrieve函数实现 + +import datetime + +from numpy import dot +from numpy.linalg import norm + +from metagpt.ext.stanford_town.memory.agent_memory import BasicMemory +from metagpt.ext.stanford_town.utils.utils import get_embedding + + +def agent_retrieve( + agent_memory, + curr_time: datetime.datetime, + memory_forget: float, + query: str, + nodes: list[BasicMemory], + topk: int = 4, +) -> list[BasicMemory]: + """ + Retrieve需要集合Role使用,原因在于Role才具有AgentMemory,scratch + 逻辑:Role调用该函数,self.rc.AgentMemory,self.rc.scratch.curr_time,self.rc.scratch.memory_forget + 输入希望查询的内容与希望回顾的条数,返回TopK条高分记忆,即List[BasicMemory] + + Score_lists示例 + { + "memory": memories[i], BasicMemory类 + "importance": memories[i].poignancy + "recency": 衰减因子计算结果 + "relevance": 搜索结果 + } + """ + memories = nodes + agent_memory_embedding = agent_memory.embeddings + memories = sorted(memories, key=lambda memory_node: memory_node.last_accessed, reverse=True) + + score_list = [] + score_list = extract_importance(memories, score_list) + score_list = extract_recency(curr_time, memory_forget, score_list) + score_list = extract_relevance(agent_memory_embedding, query, score_list) + score_list = normalize_score_floats(score_list, 0, 1) + + total_dict = {} + gw = [1, 1, 1] # 三个因素的权重,重要性,近因性,相关性, + for i in range(len(score_list)): + total_score = ( + score_list[i]["importance"] * gw[0] + score_list[i]["recency"] * gw[1] + score_list[i]["relevance"] * gw[2] + ) + total_dict[score_list[i]["memory"].memory_id] = total_score + + result = top_highest_x_values(total_dict, topk) + + return result # 返回的是一个BasicMemory列表 + + +def new_agent_retrieve(role, focus_points: list, n_count=30) -> dict: + """ + 输入为role,关注点列表,返回记忆数量 + 输出为字典,键为focus_point,值为对应的记忆列表 + """ + retrieved = dict() + for focal_pt in focus_points: + nodes = [ + [i.last_accessed, i] + for i in role.memory.event_list + role.memory.thought_list + if "idle" not in i.embedding_key + ] + nodes = sorted(nodes, key=lambda x: x[0]) + nodes = [i for created, i in nodes] + results = agent_retrieve( + role.memory, role.scratch.curr_time, role.scratch.recency_decay, focal_pt, nodes, n_count + ) + final_result = [] + for n in results: + for i in role.memory.storage: + if i.memory_id == n: + i.last_accessed = role.scratch.curr_time + final_result.append(i) + + retrieved[focal_pt] = final_result + + return retrieved + + +def top_highest_x_values(d, x): + """ + 输入字典,Topx + 返回以字典值排序,字典键组成的List[BasicMemory] + """ + top_v = [item[0] for item in sorted(d.items(), key=lambda item: item[1], reverse=True)[:x]] + return top_v + + +def extract_importance(memories, score_list): + """ + 抽取重要性 + """ + for i in range(len(memories)): + score = {"memory": memories[i], "importance": memories[i].poignancy} + score_list.append(score) + return score_list + + +def extract_relevance(agent_memory_embedding, query, score_list): + """ + 抽取相关性 + """ + query_embedding = get_embedding(query) + # 进行 + for i in range(len(score_list)): + node_embedding = agent_memory_embedding[score_list[i]["memory"].embedding_key] + result = cos_sim(node_embedding, query_embedding) + score_list[i]["relevance"] = result + + return score_list + + +def extract_recency(curr_time, memory_forget, score_list): + """ + 抽取近因性,目前使用的现实世界过一天走一个衰减因子 + """ + for i in range(len(score_list)): + day_count = (curr_time - score_list[i]["memory"].created).days + score_list[i]["recency"] = memory_forget**day_count + return score_list + + +def cos_sim(a, b): + """ + 计算余弦相似度 + """ + return dot(a, b) / (norm(a) * norm(b)) + + +def normalize_list_floats(single_list, target_min, target_max): + """ + 单个列表归一化 + """ + if len(single_list) == 0: + return [] + + min_val = min(single_list) + max_val = max(single_list) + range_val = max_val - min_val + + if range_val == 0: + for i in range(len(single_list)): + single_list[i] = (target_max - target_min) / 2 + else: + for i in range(len(single_list)): + single_list[i] = (single_list[i] - min_val) * (target_max - target_min) / range_val + target_min + return single_list + + +def normalize_score_floats(score_list, target_min, target_max): + """ + 整体归一化 + """ + importance_list = [] + relevance_list = [] + recency_list = [] + + for i in range(len(score_list)): + importance_list.append(score_list[i]["importance"]) + relevance_list.append(score_list[i]["relevance"]) + recency_list.append(score_list[i]["recency"]) + + # 进行归一化操作 + importance_list = normalize_list_floats(importance_list, target_min, target_max) + relevance_list = normalize_list_floats(relevance_list, target_min, target_max) + recency_list = normalize_list_floats(recency_list, target_min, target_max) + + for i in range(len(score_list)): + score_list[i]["importance"] = importance_list[i] + score_list[i]["relevance"] = relevance_list[i] + score_list[i]["recency"] = recency_list[i] + + return score_list diff --git a/metagpt/ext/stanford_town/memory/scratch.py b/metagpt/ext/stanford_town/memory/scratch.py new file mode 100644 index 0000000000..b4036f839f --- /dev/null +++ b/metagpt/ext/stanford_town/memory/scratch.py @@ -0,0 +1,383 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : Scratch类实现(角色信息类) + +from datetime import datetime, timedelta +from pathlib import Path +from typing import Optional, Union + +from pydantic import BaseModel, Field, field_serializer, field_validator + +from metagpt.utils.common import read_json_file, write_json_file + + +class Scratch(BaseModel): + # 类别1:人物超参 + vision_r: int = 4 + att_bandwidth: int = 3 + retention: int = 5 + + # 类别2:世界信息 + curr_time: Optional[datetime] = Field(default=None) + curr_tile: Optional[list[int]] = Field(default=None) + daily_plan_req: Optional[str] = Field(default=None) + + # 类别3:人物角色的核心身份 + name: Optional[str] = Field(default=None) + first_name: Optional[str] = Field(default=None) + last_name: Optional[str] = Field(default=None) + age: Optional[int] = Field(default=None) + innate: Optional[str] = Field(default=None) # L0 permanent core traits. + learned: Optional[str] = Field(default=None) # L1 stable traits. + currently: Optional[str] = Field(default=None) # L2 external implementation. + lifestyle: Optional[str] = Field(default=None) + living_area: Optional[str] = Field(default=None) + + # 类别4:旧反思变量 + concept_forget: int = 100 + daily_reflection_time: int = 60 * 3 + daily_reflection_size: int = 5 + overlap_reflect_th: int = 2 + kw_strg_event_reflect_th: int = 4 + kw_strg_thought_reflect_th: int = 4 + + # 类别5:新反思变量 + recency_w: int = 1 + relevance_w: int = 1 + importance_w: int = 1 + recency_decay: float = 0.99 + importance_trigger_max: int = 150 + importance_trigger_curr: int = 150 + importance_ele_n: int = 0 + thought_count: int = 5 + + # 类别6:个人计划 + daily_req: list[str] = Field(default=[]) + f_daily_schedule: list[list[Union[int, str]]] = Field(default=[]) + f_daily_schedule_hourly_org: list[list[Union[int, str]]] = Field(default=[]) + + # 类别7:当前动作 + act_address: Optional[str] = Field(default=None) + act_start_time: Optional[datetime] = Field(default=None) + act_duration: Optional[int] = Field(default=None) + act_description: Optional[str] = Field(default=None) + act_pronunciatio: Optional[str] = Field(default=None) + act_event: list[Optional[str]] = [None, None, None] + + act_obj_description: Optional[str] = Field(default=None) + act_obj_pronunciatio: Optional[str] = Field(default=None) + act_obj_event: list[Optional[str]] = [None, None, None] + + chatting_with: Optional[str] = Field(default=None) + chat: Optional[str] = Field(default=None) + chatting_with_buffer: dict = dict() + chatting_end_time: Optional[datetime] = Field(default=None) + + act_path_set: bool = False + planned_path: list[list[int]] = Field(default=[]) + + @field_validator("curr_time", "act_start_time", "chatting_end_time", mode="before") + @classmethod + def check_time_filed(cls, time_filed): + val = datetime.strptime(time_filed, "%B %d, %Y, %H:%M:%S") if time_filed else None + return val + + @field_serializer("curr_time", "act_start_time", "chatting_end_time") + def transform_time_field(self, time_filed: Optional[datetime]) -> str: + if time_filed: + time_filed = time_filed.strftime("%B %d, %Y, %H:%M:%S") + return time_filed + + @classmethod + def init_scratch_from_path(cls, f_saved: Path): + scratch_load = read_json_file(f_saved) + scratch = Scratch(**scratch_load) + return scratch + + def save(self, out_json: Path): + """ + Save persona's scratch. + + INPUT: + out_json: The file where we wil be saving our persona's state. + OUTPUT: + None + """ + scratch = self.model_dump() + write_json_file(out_json, scratch, encoding="utf-8") + + def get_f_daily_schedule_index(self, advance=0): + """ + We get the current index of self.f_daily_schedule. + + Recall that self.f_daily_schedule stores the decomposed action sequences + up until now, and the hourly sequences of the future action for the rest + of today. Given that self.f_daily_schedule is a list of list where the + inner list is composed of [task, duration], we continue to add up the + duration until we reach "if elapsed > today_min_elapsed" condition. The + index where we stop is the index we will return. + + INPUT + advance: Integer value of the number minutes we want to look into the + future. This allows us to get the index of a future timeframe. + OUTPUT + an integer value for the current index of f_daily_schedule. + """ + # We first calculate teh number of minutes elapsed today. + today_min_elapsed = 0 + today_min_elapsed += self.curr_time.hour * 60 + today_min_elapsed += self.curr_time.minute + today_min_elapsed += advance + + x = 0 + for task, duration in self.f_daily_schedule: + x += duration + x = 0 + for task, duration in self.f_daily_schedule_hourly_org: + x += duration + + # We then calculate the current index based on that. + curr_index = 0 + elapsed = 0 + for task, duration in self.f_daily_schedule: + elapsed += duration + if elapsed > today_min_elapsed: + return curr_index + curr_index += 1 + + return curr_index + + def get_f_daily_schedule_hourly_org_index(self, advance=0): + """ + We get the current index of self.f_daily_schedule_hourly_org. + It is otherwise the same as get_f_daily_schedule_index. + + INPUT + advance: Integer value of the number minutes we want to look into the + future. This allows us to get the index of a future timeframe. + OUTPUT + an integer value for the current index of f_daily_schedule. + """ + # We first calculate teh number of minutes elapsed today. + today_min_elapsed = 0 + today_min_elapsed += self.curr_time.hour * 60 + today_min_elapsed += self.curr_time.minute + today_min_elapsed += advance + # We then calculate the current index based on that. + curr_index = 0 + elapsed = 0 + for task, duration in self.f_daily_schedule_hourly_org: + elapsed += duration + if elapsed > today_min_elapsed: + return curr_index + curr_index += 1 + return curr_index + + def get_str_iss(self): + """ + ISS stands for "identity stable set." This describes the commonset summary + of this persona -- basically, the bare minimum description of the persona + that gets used in almost all prompts that need to call on the persona. + + INPUT + None + OUTPUT + the identity stable set summary of the persona in a string form. + EXAMPLE STR OUTPUT + "Name: Dolores Heitmiller + Age: 28 + Innate traits: hard-edged, independent, loyal + Learned traits: Dolores is a painter who wants live quietly and paint + while enjoying her everyday life. + Currently: Dolores is preparing for her first solo show. She mostly + works from home. + Lifestyle: Dolores goes to bed around 11pm, sleeps for 7 hours, eats + dinner around 6pm. + Daily plan requirement: Dolores is planning to stay at home all day and + never go out." + """ + commonset = "" + commonset += f"Name: {self.name}\n" + commonset += f"Age: {self.age}\n" + commonset += f"Innate traits: {self.innate}\n" + commonset += f"Learned traits: {self.learned}\n" + commonset += f"Currently: {self.currently}\n" + commonset += f"Lifestyle: {self.lifestyle}\n" + commonset += f"Daily plan requirement: {self.daily_plan_req}\n" + commonset += f"Current Date: {self.curr_time.strftime('%A %B %d') if self.curr_time else ''}\n" + return commonset + + def get_str_name(self): + return self.name + + def get_str_firstname(self): + return self.first_name + + def get_str_lastname(self): + return self.last_name + + def get_str_age(self): + return str(self.age) + + def get_str_innate(self): + return self.innate + + def get_str_learned(self): + return self.learned + + def get_str_currently(self): + return self.currently + + def get_str_lifestyle(self): + return self.lifestyle + + def get_str_daily_plan_req(self): + return self.daily_plan_req + + def get_str_curr_date_str(self): + return self.curr_time.strftime("%A %B %d") + + def get_curr_event(self): + if not self.act_address: + return self.name, None, None + else: + return self.act_event + + def get_curr_event_and_desc(self): + if not self.act_address: + return self.name, None, None, None + else: + return self.act_event[0], self.act_event[1], self.act_event[2], self.act_description + + def get_curr_obj_event_and_desc(self): + if not self.act_address: + return "", None, None, None + else: + return self.act_address, self.act_obj_event[1], self.act_obj_event[2], self.act_obj_description + + def add_new_action( + self, + action_address, + action_duration, + action_description, + action_pronunciatio, + action_event, + chatting_with, + chat, + chatting_with_buffer, + chatting_end_time, + act_obj_description, + act_obj_pronunciatio, + act_obj_event, + act_start_time=None, + ): + self.act_address = action_address + self.act_duration = action_duration + self.act_description = action_description + self.act_pronunciatio = action_pronunciatio + self.act_event = action_event + + self.chatting_with = chatting_with + self.chat = chat + if chatting_with_buffer: + self.chatting_with_buffer.update(chatting_with_buffer) + self.chatting_end_time = chatting_end_time + + self.act_obj_description = act_obj_description + self.act_obj_pronunciatio = act_obj_pronunciatio + self.act_obj_event = act_obj_event + + self.act_start_time = self.curr_time + + self.act_path_set = False + + def act_time_str(self): + """ + Returns a string output of the current time. + + INPUT + None + OUTPUT + A string output of the current time. + EXAMPLE STR OUTPUT + "14:05 P.M." + """ + return self.act_start_time.strftime("%H:%M %p") + + def act_check_finished(self): + """ + Checks whether the self.Action instance has finished. + + INPUT + curr_datetime: Current time. If current time is later than the action's + start time + its duration, then the action has finished. + OUTPUT + Boolean [True]: Action has finished. + Boolean [False]: Action has not finished and is still ongoing. + """ + if not self.act_address: + return True + + if self.chatting_with: + end_time = self.chatting_end_time + else: + x = self.act_start_time + if x.second != 0: + x = x.replace(second=0) + x = x + timedelta(minutes=1) + end_time = x + timedelta(minutes=self.act_duration) + + if end_time.strftime("%H:%M:%S") == self.curr_time.strftime("%H:%M:%S"): + return True + return False + + def act_summarize(self): + """ + Summarize the current action as a dictionary. + + INPUT + None + OUTPUT + ret: A human readable summary of the action. + """ + exp = dict() + exp["persona"] = self.name + exp["address"] = self.act_address + exp["start_datetime"] = self.act_start_time + exp["duration"] = self.act_duration + exp["description"] = self.act_description + exp["pronunciatio"] = self.act_pronunciatio + return exp + + def act_summary_str(self): + """ + Returns a string summary of the current action. Meant to be + human-readable. + + INPUT + None + OUTPUT + ret: A human readable summary of the action. + """ + start_datetime_str = self.act_start_time.strftime("%A %B %d -- %H:%M %p") + ret = f"[{start_datetime_str}]\n" + ret += f"Activity: {self.name} is {self.act_description}\n" + ret += f"Address: {self.act_address}\n" + ret += f"Duration in minutes (e.g., x min): {str(self.act_duration)} min\n" + return ret + + def get_daily_schedule(self, daily_schedule: list[list[str]]): + ret = "" + curr_min_sum = 0 + for row in daily_schedule: + curr_min_sum += row[1] + hour = int(curr_min_sum / 60) + minute = curr_min_sum % 60 + ret += f"{hour:02}:{minute:02} || {row[0]}\n" + return ret + + def get_str_daily_schedule_summary(self): + return self.get_daily_schedule(self.f_daily_schedule) + + def get_str_daily_schedule_hourly_org_summary(self): + return self.get_daily_schedule(self.f_daily_schedule_hourly_org) diff --git a/metagpt/ext/stanford_town/memory/spatial_memory.py b/metagpt/ext/stanford_town/memory/spatial_memory.py new file mode 100644 index 0000000000..71b8569079 --- /dev/null +++ b/metagpt/ext/stanford_town/memory/spatial_memory.py @@ -0,0 +1,116 @@ +""" +Author: Joon Sung Park (joonspk@stanford.edu) + +File: spatial_memory.py +Description: Defines the MemoryTree class that serves as the agents' spatial +memory that aids in grounding their behavior in the game world. +""" +from pathlib import Path + +from pydantic import BaseModel, Field + +from metagpt.logs import logger +from metagpt.utils.common import read_json_file, write_json_file + + +class MemoryTree(BaseModel): + tree: dict = Field(default=dict) + + def set_mem_path(self, f_saved: Path): + self.tree = read_json_file(f_saved) + + def print_tree(self) -> None: + def _print_tree(tree, depth): + dash = " >" * depth + if isinstance(tree, list): + if tree: + logger.info(f"{dash} {tree}") + return + + for key, val in tree.items(): + if key: + logger.info(f"{dash} {tree}") + _print_tree(val, depth + 1) + + _print_tree(self.tree, 0) + + def save(self, out_json: Path) -> None: + write_json_file(out_json, self.tree) + + def get_str_accessible_sectors(self, curr_world: str) -> str: + """ + Returns a summary string of all the arenas that the persona can access + within the current sector. + + Note that there are places a given persona cannot enter. This information + is provided in the persona sheet. We account for this in this function. + + INPUT + None + OUTPUT + A summary string of all the arenas that the persona can access. + EXAMPLE STR OUTPUT + "bedroom, kitchen, dining room, office, bathroom" + """ + x = ", ".join(list(self.tree[curr_world].keys())) + return x + + def get_str_accessible_sector_arenas(self, sector: str) -> str: + """ + Returns a summary string of all the arenas that the persona can access + within the current sector. + + Note that there are places a given persona cannot enter. This information + is provided in the persona sheet. We account for this in this function. + + INPUT + None + OUTPUT + A summary string of all the arenas that the persona can access. + EXAMPLE STR OUTPUT + "bedroom, kitchen, dining room, office, bathroom" + """ + curr_world, curr_sector = sector.split(":") + if not curr_sector: + return "" + x = ", ".join(list(self.tree[curr_world][curr_sector].keys())) + return x + + def get_str_accessible_arena_game_objects(self, arena: str) -> str: + """ + Get a str list of all accessible game objects that are in the arena. If + temp_address is specified, we return the objects that are available in + that arena, and if not, we return the objects that are in the arena our + persona is currently in. + + INPUT + temp_address: optional arena address + OUTPUT + str list of all accessible game objects in the gmae arena. + EXAMPLE STR OUTPUT + "phone, charger, bed, nightstand" + """ + curr_world, curr_sector, curr_arena = arena.split(":") + + if not curr_arena: + return "" + + try: + x = ", ".join(list(self.tree[curr_world][curr_sector][curr_arena])) + except Exception: + x = ", ".join(list(self.tree[curr_world][curr_sector][curr_arena.lower()])) + return x + + def add_tile_info(self, tile_info: dict) -> None: + if tile_info["world"]: + if tile_info["world"] not in self.tree: + self.tree[tile_info["world"]] = {} + if tile_info["sector"]: + if tile_info["sector"] not in self.tree[tile_info["world"]]: + self.tree[tile_info["world"]][tile_info["sector"]] = {} + if tile_info["arena"]: + if tile_info["arena"] not in self.tree[tile_info["world"]][tile_info["sector"]]: + self.tree[tile_info["world"]][tile_info["sector"]][tile_info["arena"]] = [] + if tile_info["game_object"]: + if tile_info["game_object"] not in self.tree[tile_info["world"]][tile_info["sector"]][tile_info["arena"]]: + self.tree[tile_info["world"]][tile_info["sector"]][tile_info["arena"]] += [tile_info["game_object"]] diff --git a/metagpt/ext/stanford_town/plan/__init__.py b/metagpt/ext/stanford_town/plan/__init__.py new file mode 100644 index 0000000000..2bcf8efd09 --- /dev/null +++ b/metagpt/ext/stanford_town/plan/__init__.py @@ -0,0 +1,3 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : diff --git a/metagpt/ext/stanford_town/plan/converse.py b/metagpt/ext/stanford_town/plan/converse.py new file mode 100644 index 0000000000..8eefbc9b42 --- /dev/null +++ b/metagpt/ext/stanford_town/plan/converse.py @@ -0,0 +1,93 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : conversation between two agents + +from typing import Tuple + +from metagpt.ext.stanford_town.actions.agent_chat_sum_rel import AgentChatSumRel +from metagpt.ext.stanford_town.actions.gen_iter_chat_utt import GenIterChatUTT +from metagpt.ext.stanford_town.memory.retrieve import new_agent_retrieve +from metagpt.logs import logger + + +async def agent_conversation(init_role: "STRole", target_role: "STRole", conv_rounds: int = 8) -> list[list[str]]: + curr_chat = [] + logger.info(f"Role: {init_role.name} starts a conversation with Role: {target_role.name}") + + for idx in range(conv_rounds): + logger.info(f"Conv round: {idx} between {init_role.name} and {target_role.name}") + scratch = init_role.rc.scratch + target_scratch = target_role.rc.scratch + + focal_points = [f"{target_scratch.name}"] + retrieved = new_agent_retrieve(init_role, focal_points, 50) + relationship = await generate_summarize_agent_relationship(init_role, target_role, retrieved) + logger.info(f"The relationship between {init_role.name} and {target_role.name}: {relationship}") + last_chat = "" + for i in curr_chat[-4:]: + last_chat += ": ".join(i) + "\n" + if last_chat: + focal_points = [f"{relationship}", f"{target_scratch.name} is {target_scratch.act_description}", last_chat] + else: + focal_points = [f"{relationship}", f"{target_scratch.name} is {target_scratch.act_description}"] + retrieved = new_agent_retrieve(init_role, focal_points, 15) + utt, end = await generate_one_utterance(init_role, target_role, retrieved, curr_chat) + + curr_chat += [[scratch.name, utt]] + if end: + break + + focal_points = [f"{scratch.name}"] + retrieved = new_agent_retrieve(target_role, focal_points, 50) + relationship = await generate_summarize_agent_relationship(target_role, init_role, retrieved) + logger.info(f"The relationship between {target_role.name} and {init_role.name}: {relationship}") + last_chat = "" + for i in curr_chat[-4:]: + last_chat += ": ".join(i) + "\n" + if last_chat: + focal_points = [f"{relationship}", f"{scratch.name} is {scratch.act_description}", last_chat] + else: + focal_points = [f"{relationship}", f"{scratch.name} is {scratch.act_description}"] + retrieved = new_agent_retrieve(target_role, focal_points, 15) + utt, end = await generate_one_utterance(target_role, init_role, retrieved, curr_chat) + + curr_chat += [[target_scratch.name, utt]] + if end: + break + + logger.warning(f"Conversations between {target_role.name} and {init_role.name}:") + for row in curr_chat: + logger.info(row) + + return curr_chat + + +async def generate_summarize_agent_relationship(init_role: "STRole", target_role: "STRole", retrieved: dict) -> str: + all_embedding_keys = list() + for key, val in retrieved.items(): + for i in val: + all_embedding_keys += [i.embedding_key] + all_embedding_key_str = "" + for i in all_embedding_keys: + all_embedding_key_str += f"{i}\n" + + summarized_relationship = await AgentChatSumRel().run(init_role, target_role, all_embedding_key_str) + return summarized_relationship + + +async def generate_one_utterance(init_role, target_role, retrieved: dict, curr_chat: list) -> Tuple[str, str]: + # Chat version optimized for speed via batch generation + scratch = init_role.rc.scratch + target_scratch = target_role.rc.scratch + curr_context = ( + f"{scratch.name} " + + f"was {scratch.act_description} " + + f"when {scratch.name} " + + f"saw {target_scratch.name} " + + f"in the middle of {target_scratch.act_description}.\n" + ) + curr_context += f"{scratch.name} " + "is initiating a conversation with " + f"{target_scratch.name}." + + x = await GenIterChatUTT().run(init_role, target_role, retrieved, curr_context, curr_chat) + + return x["utterance"], x["end"] diff --git a/metagpt/ext/stanford_town/plan/st_plan.py b/metagpt/ext/stanford_town/plan/st_plan.py new file mode 100644 index 0000000000..f63052fc53 --- /dev/null +++ b/metagpt/ext/stanford_town/plan/st_plan.py @@ -0,0 +1,706 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# @Desc : st' planning execution + +import datetime +import math +import random +from typing import Tuple, Union + +from metagpt.ext.stanford_town.actions.decide_to_talk import DecideToTalk +from metagpt.ext.stanford_town.actions.gen_action_details import GenActionDetails +from metagpt.ext.stanford_town.actions.gen_daily_schedule import GenDailySchedule +from metagpt.ext.stanford_town.actions.gen_hourly_schedule import GenHourlySchedule +from metagpt.ext.stanford_town.actions.new_decomp_schedule import NewDecompSchedule +from metagpt.ext.stanford_town.actions.summarize_conv import SummarizeConv +from metagpt.ext.stanford_town.actions.task_decomp import TaskDecomp +from metagpt.ext.stanford_town.actions.wake_up import WakeUp +from metagpt.ext.stanford_town.memory.retrieve import new_agent_retrieve +from metagpt.ext.stanford_town.plan.converse import agent_conversation +from metagpt.ext.stanford_town.utils.utils import get_embedding +from metagpt.llm import LLM +from metagpt.logs import logger + + +async def plan(role: "STRole", roles: dict["STRole"], new_day: bool, retrieved: dict) -> str: + # PART 1: Generate the hourly schedule. + if new_day: + await _long_term_planning(role, new_day) + + # PART 2: If the current action has expired, we want to create a new plan. + act_check_finished = role.scratch.act_check_finished() + logger.info(f"Role: {role.name} act_check_finished is {act_check_finished}") + if act_check_finished: + await _determine_action(role) + + # PART 3: If you perceived an event that needs to be responded to (saw + # another role), and retrieved relevant information. + # Step 1: Retrieved may have multiple events represented in it. The first + # job here is to determine which of the events we want to focus + # on for the role. + # takes the form of a dictionary like this: + # dictionary {["curr_event"] = , + # ["events"] = [, ...], + # ["thoughts"] = [, ...]} + focused_event = False + if retrieved.keys(): + focused_event = _choose_retrieved(role.name, retrieved) + + # Step 2: Once we choose an event, we need to determine whether the + # role will take any actions for the perceived event. There are + # three possible modes of reaction returned by _should_react. + # a) "chat with {target_role.name}" + # b) "react" + # c) False + logger.info(f"Role: {role.name} focused_event: {focused_event}") + if focused_event: + reaction_mode = await _should_react(role, focused_event, roles) + logger.info(f"Role: {role.name} reaction_mode: {reaction_mode}") + if reaction_mode: + # If we do want to chat, then we generate conversation + if reaction_mode[:9] == "chat with": + await _chat_react(role, reaction_mode, roles) + elif reaction_mode[:4] == "wait": + await _wait_react(role, reaction_mode) + + # Step 3: Chat-related state clean up. + # If the persona is not chatting with anyone, we clean up any of the + # chat-related states here. + if role.rc.scratch.act_event[1] != "chat with": + role.rc.scratch.chatting_with = None + role.rc.scratch.chat = None + role.rc.scratch.chatting_end_time = None + # We want to make sure that the persona does not keep conversing with each + # other in an infinite loop. So, chatting_with_buffer maintains a form of + # buffer that makes the persona wait from talking to the same target + # immediately after chatting once. We keep track of the buffer value here. + curr_persona_chat_buffer = role.rc.scratch.chatting_with_buffer + for persona_name, buffer_count in curr_persona_chat_buffer.items(): + if persona_name != role.rc.scratch.chatting_with: + role.rc.scratch.chatting_with_buffer[persona_name] -= 1 + + return role.rc.scratch.act_address + + +def _choose_retrieved(role_name: str, retrieved: dict) -> Union[None, dict]: + """ + Retrieved elements have multiple core "curr_events". We need to choose one + event to which we are going to react to. We pick that event here. + Args: + role_name: Current role instance's name whose action we are determining. + retrieved: A dictionary of that were retrieved from the + the role's associative memory. This dictionary takes the + following form: + dictionary[event.description] = + {["curr_event"] = , + ["events"] = [, ...], + ["thoughts"] = [, ...] } + """ + # Once we are done with the reflection, we might want to build a more + # complex structure here. + + # We do not want to take self events... for now + copy_retrieved = retrieved.copy() + for event_desc, rel_ctx in copy_retrieved.items(): + curr_event = rel_ctx["curr_event"] + if curr_event.subject == role_name: + del retrieved[event_desc] + + # Always choose role first. + priority = [] + for event_desc, rel_ctx in retrieved.items(): + curr_event = rel_ctx["curr_event"] + if ":" not in curr_event.subject and curr_event.subject != role_name: + priority += [rel_ctx] + if priority: + return random.choice(priority) + + # Skip idle. + for event_desc, rel_ctx in retrieved.items(): + if "is idle" not in event_desc: + priority += [rel_ctx] + if priority: + return random.choice(priority) + return None + + +async def _should_react(role: "STRole", retrieved: dict, roles: dict): + """ + Determines what form of reaction the role should exihibit given the + retrieved values. + INPUT + role: Current <"STRole"> instance whose action we are determining. + retrieved: A dictionary of that were retrieved from the + the role's associative memory. This dictionary takes the + following form: + dictionary[event.description] = + {["curr_event"] = , + ["events"] = [, ...], + ["thoughts"] = [, ...] } + roles: A dictionary that contains all role names as keys, and the + <"STRole"> instance as values. + """ + + async def lets_talk(init_role: "STRole", target_role: "STRole", retrieved: dict): + if init_role.name == target_role.name: + logger.info(f"Role: {role.name} _should_react lets_talk meet same role, return False") + return False + + scratch = init_role.rc.scratch + target_scratch = target_role.rc.scratch + if ( + not target_scratch.act_address + or not target_scratch.act_description + or not scratch.act_address + or not scratch.act_description + ): + return False + + if "sleeping" in target_scratch.act_description or "sleeping" in scratch.act_description: + return False + + if scratch.curr_time.hour == 23: + return False + + if "" in target_scratch.act_address: + return False + + if target_scratch.chatting_with or scratch.chatting_with: + return False + + if target_role.name in scratch.chatting_with_buffer: + if scratch.chatting_with_buffer[target_role.name] > 0: + return False + + if await DecideToTalk().run(init_role, target_role, retrieved): + return True + + return False + + async def lets_react(init_role: "STRole", target_role: "STRole", retrieved: dict): + if init_role.name == target_role.name: + logger.info(f"Role: {role.name} _should_react lets_react meet same role, return False") + return False + + scratch = init_role.rc.scratch + target_scratch = target_role.rc.scratch + if ( + not target_scratch.act_address + or not target_scratch.act_description + or not scratch.act_address + or not scratch.act_description + ): + return False + + if "sleeping" in target_scratch.act_description or "sleeping" in scratch.act_description: + return False + + # return False + if scratch.curr_time.hour == 23: + return False + + if "waiting" in target_scratch.act_description: + return False + if scratch.planned_path == []: + return False + + if scratch.act_address != target_scratch.act_address: + return False + + react_mode = await DecideToTalk().run(init_role, target_role, retrieved) + + if react_mode == "1": + wait_until = ( + target_scratch.act_start_time + datetime.timedelta(minutes=target_scratch.act_duration - 1) + ).strftime("%B %d, %Y, %H:%M:%S") + return f"wait: {wait_until}" + elif react_mode == "2": + return False + return "do other things" + else: + return False # "keep" + + # If the role is chatting right now, default to no reaction + scratch = role.rc.scratch + if scratch.chatting_with: + return False + if "" in scratch.act_address: + return False + + # Recall that retrieved takes the following form: + # dictionary {["curr_event"] = } + curr_event = retrieved["curr_event"] + logger.info(f"Role: {role.name} _should_react curr_event.subject: {curr_event.subject}") + + if ":" not in curr_event.subject: + # this is a role event. + if await lets_talk(role, roles[curr_event.subject], retrieved): + return f"chat with {curr_event.subject}" + react_mode = await lets_react(role, roles[curr_event.subject], retrieved) + return react_mode + return False + + +async def _chat_react(role: "STRole", reaction_mode: str, roles: dict["STRole"]): + # There are two roles -- the role who is initiating the conversation + # and the role who is the target. We get the role instances here. + init_role = role + target_role = roles[reaction_mode[9:].strip()] + + # Actually creating the conversation here. + convo, duration_min = await generate_convo(init_role, target_role) # 2222 + convo_summary = await generate_convo_summary(convo) + inserted_act = convo_summary + inserted_act_dur = duration_min + + act_start_time = target_role.rc.scratch.act_start_time + + curr_time = target_role.rc.scratch.curr_time + if curr_time.second != 0: + temp_curr_time = curr_time + datetime.timedelta(seconds=60 - curr_time.second) + chatting_end_time = temp_curr_time + datetime.timedelta(minutes=inserted_act_dur) + else: + chatting_end_time = curr_time + datetime.timedelta(minutes=inserted_act_dur) + + for role, p in [("init", init_role), ("target", target_role)]: + if role == "init": + act_address = f" {target_role.name}" + act_event = (p.name, "chat with", target_role.name) + chatting_with = target_role.name + chatting_with_buffer = {} + chatting_with_buffer[target_role.name] = 800 + elif role == "target": + act_address = f" {init_role.name}" + act_event = (p.name, "chat with", init_role.name) + chatting_with = init_role.name + chatting_with_buffer = {} + chatting_with_buffer[init_role.name] = 800 + + act_pronunciatio = "💬" + act_obj_description = None + act_obj_pronunciatio = None + act_obj_event = (None, None, None) + + await _create_react( + p, + inserted_act, + inserted_act_dur, + act_address, + act_event, + chatting_with, + convo, + chatting_with_buffer, + chatting_end_time, + act_pronunciatio, + act_obj_description, + act_obj_pronunciatio, + act_obj_event, + act_start_time, + ) + + +async def _create_react( + role: "STRole", + inserted_act: str, + inserted_act_dur: int, + act_address: str, + act_event: Tuple, + chatting_with: str, + chat: list, + chatting_with_buffer: dict, + chatting_end_time: datetime, + act_pronunciatio: str, + act_obj_description: str, + act_obj_pronunciatio: str, + act_obj_event: Tuple, + act_start_time=None, +): + p = role + scratch = role.rc.scratch + + min_sum = 0 + for i in range(scratch.get_f_daily_schedule_hourly_org_index()): + min_sum += scratch.f_daily_schedule_hourly_org[i][1] + start_hour = int(min_sum / 60) + + if scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index()][1] >= 120: + end_hour = ( + start_hour + scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index()][1] / 60 + ) + + elif ( + scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index()][1] + + scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index() + 1][1] + ): + end_hour = start_hour + ( + ( + scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index()][1] + + scratch.f_daily_schedule_hourly_org[scratch.get_f_daily_schedule_hourly_org_index() + 1][1] + ) + / 60 + ) + + else: + end_hour = start_hour + 2 + end_hour = int(end_hour) + + dur_sum = 0 + count = 0 + start_index = None + end_index = None + for act, dur in scratch.f_daily_schedule: + if dur_sum >= start_hour * 60 and start_index is None: + start_index = count + if dur_sum >= end_hour * 60 and end_index is None: + end_index = count + dur_sum += dur + count += 1 + + ret = await generate_new_decomp_schedule(p, inserted_act, inserted_act_dur, start_hour, end_hour) + scratch.f_daily_schedule[start_index:end_index] = ret + scratch.add_new_action( + act_address, + inserted_act_dur, + inserted_act, + act_pronunciatio, + act_event, + chatting_with, + chat, + chatting_with_buffer, + chatting_end_time, + act_obj_description, + act_obj_pronunciatio, + act_obj_event, + act_start_time, + ) + + +async def _wait_react(role: "STRole", reaction_mode: str): + scratch = role.rc.scratch + + inserted_act = f'waiting to start {scratch.act_description.split("(")[-1][:-1]}' + end_time = datetime.datetime.strptime(reaction_mode[6:].strip(), "%B %d, %Y, %H:%M:%S") + inserted_act_dur = ( + (end_time.minute + end_time.hour * 60) - (scratch.curr_time.minute + scratch.curr_time.hour * 60) + 1 + ) + + act_address = f" {scratch.curr_tile[0]} {scratch.curr_tile[1]}" + act_event = (role.name, "waiting to start", scratch.act_description.split("(")[-1][:-1]) + chatting_with = None + chat = None + chatting_with_buffer = None + chatting_end_time = None + + act_pronunciatio = "⌛" + act_obj_description = None + act_obj_pronunciatio = None + act_obj_event = (None, None, None) + + await _create_react( + role, + inserted_act, + inserted_act_dur, + act_address, + act_event, + chatting_with, + chat, + chatting_with_buffer, + chatting_end_time, + act_pronunciatio, + act_obj_description, + act_obj_pronunciatio, + act_obj_event, + ) + + +async def generate_convo(init_role: "STRole", target_role: "STRole") -> Union[list, int]: + convo = await agent_conversation(init_role, target_role) + all_utt = "" + + for row in convo: + speaker = row[0] + utt = row[1] + all_utt += f"{speaker}: {utt}\n" + + convo_length = math.ceil(int(len(all_utt) / 8) / 30) + + return convo, convo_length + + +async def generate_convo_summary(conv: list[list[str]]) -> str: + conv_summary = await SummarizeConv().run(conv) + return conv_summary + + +async def generate_new_decomp_schedule( + role: "STRole", inserted_act: str, inserted_act_dur: int, start_hour: int, end_hour: int +): + # Step 1: Setting up the core variables for the function. + #

is the role whose schedule we are editing right now. + scratch = role.rc.scratch + # indicates the number of minutes that have passed today. + today_min_pass = int(scratch.curr_time.hour) * 60 + int(scratch.curr_time.minute) + 1 + + # Step 2: We need to create and . + main_act_dur = [] + truncated_act_dur = [] + dur_sum = 0 # duration sum + count = 0 # enumerate count + truncated_fin = False + + logger.debug(f"DEBUG::: {scratch.name}") + for act, dur in scratch.f_daily_schedule: + if (dur_sum >= start_hour * 60) and (dur_sum < end_hour * 60): + main_act_dur += [[act, dur]] + if dur_sum <= today_min_pass: + truncated_act_dur += [[act, dur]] + elif dur_sum > today_min_pass and not truncated_fin: + # We need to insert that last act, duration list like this one: + # e.g., ['wakes up and completes her morning routine (wakes up...)', 2] + truncated_act_dur += [[scratch.f_daily_schedule[count][0], dur_sum - today_min_pass]] + truncated_act_dur[-1][-1] -= ( + dur_sum - today_min_pass + ) # DEC 7 DEBUG;.. is the +1 the right thing to do??? + # DEC 7 DEBUG;.. is the +1 the right thing to do??? + # truncated_act_dur[-1][-1] -= (dur_sum - today_min_pass + 1) + logger.debug(f"DEBUG::: {truncated_act_dur}") + + # DEC 7 DEBUG;.. is the +1 the right thing to do??? + # truncated_act_dur[-1][-1] -= (dur_sum - today_min_pass) + truncated_fin = True + dur_sum += dur + count += 1 + + main_act_dur = main_act_dur + + x = ( + truncated_act_dur[-1][0].split("(")[0].strip() + + " (on the way to " + + truncated_act_dur[-1][0].split("(")[-1][:-1] + + ")" + ) + truncated_act_dur[-1][0] = x + + if "(" in truncated_act_dur[-1][0]: + inserted_act = truncated_act_dur[-1][0].split("(")[0].strip() + " (" + inserted_act + ")" + + # To do inserted_act_dur+1 below is an important decision but I'm not sure + # if I understand the full extent of its implications. Might want to + # revisit. + truncated_act_dur += [[inserted_act, inserted_act_dur]] + start_time_hour = datetime.datetime(2022, 10, 31, 0, 0) + datetime.timedelta(hours=start_hour) + end_time_hour = datetime.datetime(2022, 10, 31, 0, 0) + datetime.timedelta(hours=end_hour) + + return await NewDecompSchedule().run( + role, main_act_dur, truncated_act_dur, start_time_hour, end_time_hour, inserted_act, inserted_act_dur + ) + + +async def _long_term_planning(role: "STRole", new_day: bool): + """ + Formulates the role's daily long-term plan if it is the start of a new + day. This basically has two components: first, we create the wake-up hour, + and second, we create the hourly schedule based on it. + INPUT + new_day: Indicates whether the current time signals a "First day", + "New day", or False (for neither). This is important because we + create the roles' long term planning on the new day. + """ + # We start by creating the wake up hour for the role. + wake_up_hour = await WakeUp().run(role) + wake_up_hour = int(wake_up_hour) + logger.info(f"Role: {role.name} long_term_planning, wake_up_hour: {wake_up_hour}") + + # When it is a new day, we start by creating the daily_req of the role. + # Note that the daily_req is a list of strings that describe the role's + # day in broad strokes. + if new_day == "First day": + # Bootstrapping the daily plan for the start of then generation: + # if this is the start of generation (so there is no previous day's + # daily requirement, or if we are on a new day, we want to create a new + # set of daily requirements. + role.scratch.daily_req = await GenDailySchedule().run(role, wake_up_hour) + logger.info(f"Role: {role.name} daily requirements: {role.scratch.daily_req}") + elif new_day == "New day": + revise_identity(role) + + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TODO + # We need to create a new daily_req here... + role.scratch.daily_req = role.scratch.daily_req + + # Based on the daily_req, we create an hourly schedule for the role, + # which is a list of todo items with a time duration (in minutes) that + # add up to 24 hours. + role.scratch.f_daily_schedule = await GenHourlySchedule().run(role, wake_up_hour) + logger.info(f"Role: {role.name} f_daily_schedule: {role.scratch.f_daily_schedule}") + role.scratch.f_daily_schedule_hourly_org = role.scratch.f_daily_schedule[:] + + # Added March 4 -- adding plan to the memory. + thought = f"This is {role.scratch.name}'s plan for {role.scratch.curr_time.strftime('%A %B %d')}:" + for i in role.scratch.daily_req: + thought += f" {i}," + thought = thought[:-1] + "." + created = role.scratch.curr_time + expiration = role.scratch.curr_time + datetime.timedelta(days=30) + s, p, o = (role.scratch.name, "plan", role.scratch.curr_time.strftime("%A %B %d")) + keywords = set(["plan"]) + thought_poignancy = 5 + thought_embedding_pair = (thought, get_embedding(thought)) + role.a_mem.add_thought( + created, expiration, s, p, o, thought, keywords, thought_poignancy, thought_embedding_pair, None + ) + + +async def _determine_action(role: "STRole"): + """ + Creates the next action sequence for the role. + The main goal of this function is to run "add_new_action" on the role's + scratch space, which sets up all the action related variables for the next + action. + As a part of this, the role may need to decompose its hourly schedule as + needed. + INPUT + role: Current instance whose action we are determining. + """ + + def determine_decomp(act_desp, act_dura): + """ + Given an action description and its duration, we determine whether we need + to decompose it. If the action is about the agent sleeping, we generally + do not want to decompose it, so that's what we catch here. + + INPUT: + act_desp: the description of the action (e.g., "sleeping") + act_dura: the duration of the action in minutes. + OUTPUT: + a boolean. True if we need to decompose, False otherwise. + """ + if "sleep" not in act_desp and "bed" not in act_desp: + return True + elif "sleeping" in act_desp or "asleep" in act_desp or "in bed" in act_desp: + return False + elif "sleep" in act_desp or "bed" in act_desp: + if act_dura > 60: + return False + return True + + # The goal of this function is to get us the action associated with + # . As a part of this, we may need to decompose some large + # chunk actions. + # Importantly, we try to decompose at least two hours worth of schedule at + # any given point. + curr_index = role.scratch.get_f_daily_schedule_index() + curr_index_60 = role.scratch.get_f_daily_schedule_index(advance=60) + + logger.info(f"f_daily_schedule: {role.scratch.f_daily_schedule}") + # * Decompose * + # During the first hour of the day, we need to decompose two hours + # sequence. We do that here. + if curr_index == 0: + # This portion is invoked if it is the first hour of the day. + act_desp, act_dura = role.scratch.f_daily_schedule[curr_index] + if act_dura >= 60: + # We decompose if the next action is longer than an hour, and fits the + # criteria described in determine_decomp. + if determine_decomp(act_desp, act_dura): + role.scratch.f_daily_schedule[curr_index : curr_index + 1] = await TaskDecomp().run( + role, act_desp, act_dura + ) + if curr_index_60 + 1 < len(role.scratch.f_daily_schedule): + act_desp, act_dura = role.scratch.f_daily_schedule[curr_index_60 + 1] + if act_dura >= 60: + if determine_decomp(act_desp, act_dura): + role.scratch.f_daily_schedule[curr_index_60 + 1 : curr_index_60 + 2] = await TaskDecomp().run( + role, act_desp, act_dura + ) + + if curr_index_60 < len(role.scratch.f_daily_schedule): + # If it is not the first hour of the day, this is always invoked (it is + # also invoked during the first hour of the day -- to double up so we can + # decompose two hours in one go). Of course, we need to have something to + # decompose as well, so we check for that too. + if role.scratch.curr_time.hour < 23: + # And we don't want to decompose after 11 pm. + act_desp, act_dura = role.scratch.f_daily_schedule[curr_index_60] + if act_dura >= 60: + if determine_decomp(act_desp, act_dura): + role.scratch.f_daily_schedule[curr_index_60 : curr_index_60 + 1] = await TaskDecomp().run( + role, act_desp, act_dura + ) + # * End of Decompose * + + # Generate an instance from the action description and duration. By + # this point, we assume that all the relevant actions are decomposed and + # ready in f_daily_schedule. + logger.debug("DEBUG LJSDLFSKJF") + for i in role.scratch.f_daily_schedule: + logger.debug(i) + logger.debug(curr_index) + logger.debug(len(role.scratch.f_daily_schedule)) + logger.debug(role.scratch.name) + + # 1440 + x_emergency = 0 + for i in role.scratch.f_daily_schedule: + x_emergency += i[1] + + if 1440 - x_emergency > 0: + logger.info(f"x_emergency__AAA: {x_emergency}") + role.scratch.f_daily_schedule += [["sleeping", 1440 - x_emergency]] + + act_desp, act_dura = role.scratch.f_daily_schedule[curr_index] + + new_action_details = await GenActionDetails().run(role, act_desp, act_dura) + # Adding the action to role's queue. + role.scratch.add_new_action(**new_action_details) + + +def revise_identity(role: "STRole"): + p_name = role.scratch.name + + focal_points = [ + f"{p_name}'s plan for {role.scratch.get_str_curr_date_str()}.", + f"Important recent events for {p_name}'s life.", + ] + retrieved = new_agent_retrieve(role, focal_points) + + statements = "[Statements]\n" + for key, val in retrieved.items(): + for i in val: + statements += f"{i.created.strftime('%A %B %d -- %H:%M %p')}: {i.embedding_key}\n" + + plan_prompt = statements + "\n" + plan_prompt += f"Given the statements above, is there anything that {p_name} should remember as they plan for" + plan_prompt += f" *{role.scratch.curr_time.strftime('%A %B %d')}*? " + plan_prompt += "If there is any scheduling information, be as specific as possible (include date, time, and location if stated in the statement)\n\n" + plan_prompt += f"Write the response from {p_name}'s perspective." + plan_note = LLM().ask(plan_prompt) + + thought_prompt = statements + "\n" + thought_prompt += ( + f"Given the statements above, how might we summarize {p_name}'s feelings about their days up to now?\n\n" + ) + thought_prompt += f"Write the response from {p_name}'s perspective." + thought_note = LLM().ask(thought_prompt) + + currently_prompt = ( + f"{p_name}'s status from {(role.scratch.curr_time - datetime.timedelta(days=1)).strftime('%A %B %d')}:\n" + ) + currently_prompt += f"{role.scratch.currently}\n\n" + currently_prompt += f"{p_name}'s thoughts at the end of {(role.scratch.curr_time - datetime.timedelta(days=1)).strftime('%A %B %d')}:\n" + currently_prompt += (plan_note + thought_note).replace("\n", "") + "\n\n" + currently_prompt += f"It is now {role.scratch.curr_time.strftime('%A %B %d')}. Given the above, write {p_name}'s status for {role.scratch.curr_time.strftime('%A %B %d')} that reflects {p_name}'s thoughts at the end of {(role.scratch.curr_time - datetime.timedelta(days=1)).strftime('%A %B %d')}. Write this in third-person talking about {p_name}." + currently_prompt += "If there is any scheduling information, be as specific as possible (include date, time, and location if stated in the statement).\n\n" + currently_prompt += "Follow this format below:\nStatus: " + new_currently = LLM().ask(currently_prompt) + + role.scratch.currently = new_currently + + daily_req_prompt = role.scratch.get_str_iss() + "\n" + daily_req_prompt += f"Today is {role.scratch.curr_time.strftime('%A %B %d')}. Here is {role.scratch.name}'s plan today in broad-strokes (with the time of the day. e.g., have a lunch at 12:00 pm, watch TV from 7 to 8 pm).\n\n" + daily_req_prompt += "Follow this format (the list should have 4~6 items but no more):\n" + daily_req_prompt += "1. wake up and complete the morning routine at