Skip to content

Latest commit

 

History

History
132 lines (96 loc) · 4.58 KB

README.md

File metadata and controls

132 lines (96 loc) · 4.58 KB

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

🔥 Jul 10, 2024: StateFlow is accepted at COLM 2024! Our paper can be found here: https://arxiv.org/abs/2403.11322.

🔥 Feb 29, 2024: StateFlow is implemented and integrated into AutoGen, checkout the BlogPost and the Notebook!

📚 Cite paper

Datasets

  • InterCode: InterCode is designed as an interactive code environments to evaluate language agents that can code. From it, we evaluate StateFlow on two datasets:

    • (1) SQL: The InterCode-SQL adapts the Spider dataset for MySQL, containing 1034 task instances. For each task, a MySQL interpreter is set up with all relevant tables within a docker container.
    • (2) Bash: The InterCode-Bash dataset has 200 task instances curated from the NL2Bash dataset.
  • ALFWorld: ALFWorld contains interactive TextWorld environments that parallel embodied worlds in the ALFRED dataset. The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.

Experiments

We recommend create separate environments for InterCode and ALFWorld.

Both benchmarks require the installation of AutoGen:

pip install pyautogen

Then, create a "OAI_CONFIG_LIST" file and add your key, this will be used to access the LLM models:

[
    {
        "model": "gpt-35-turbo-1106",
        "api_key": "Your openai key here",
    },
    {
         "model": "gpt-35-turbo-1106",
         "api_key": "Your azure key",
         "api_type": "azure",
         "base_url": "Your base url here",
         "api_version": "Your api version here",
    }
]

When running the experiments, make sure to change the path to the OAI_CONFIG_LIST file in corresponding python files (e.g., ALFWorld/stateflow.py, InterCode/flow_bash.py, InterCode/flow_sql.py):

config_list = autogen.config_list_from_json(
    "Your path to OAI_CONFIG_LIST file here",
    filter_dict={"model": model},
)

Run InterCode

  1. Please follow the instructions in the InterCode repository to download intercode. Use the build from source instructions:

    git clone https://github.com/princeton-nlp/intercode.git
    cd intercode
    conda env create -f environment.yml
    conda activate intercode
  2. After you are in intercode folder, copy files from InterCode folder to intercode folder:

    bash ../InterCode/copy_files.sh

    We did some modifications to the setup.sh and the docker files:

    • Change sql dockerfile path to ic_spider_dbs.sql.
    • Create 4 different docker images for the 4 different bash tasks.
  3. Run setup.sh to create the docker images for the InterCode Bash and SQL environments.

    bash setup.sh
  4. Run StateFlow for InterCode SQL:

    bash scripts/stateflow.sh

Run ALFWorld

  1. Please follow the instructions in the ALFWorld repository to install the ALFWorld environment.

  2. Change the relevant path in stateflow.py:

    os.environ["ALFWORLD_DATA"] = "Your path to ALFWorld data here."
  3. Run stateflow for ALFWorld:

    python stateflow.py

Citation

If you find this repo helpful, please kindly cite our publication:

@article{wu2024stateflow,
  title={StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows},
  author={Wu, Yiran and Yue, Tianwei and Zhang, Shaokun and Wang, Chi and Wu, Qingyun},
  journal={arXiv preprint arXiv:2403.11322},
  year={2024}
}

Results

Results on InterCode SQL:

SQL

Results on InterCode Bash:

Bash

Results on ALFWorld:

ALFWorld

Ablation of states on the InterCode SQL dataset with GPT-3.5-Turbo:

ablation

StateFlow + Reflexion on ALFWorld (with 6 iterations):

Reflexion