📌 This repository is under construction. Some subtasks/tools are not fully supported yet.
CoSTA* is a cost-sensitive toolpath agent designed to solve multi-turn image editing tasks efficiently. It integrates Large Language Models (LLMs) and graph search algorithms to dynamically select AI tools while balancing cost and quality. Unlike traditional text-to-image models (e.g., Stable Diffusion, DALLE-3), which struggle with complex image editing workflows, CoSTA* constructs an optimal toolpath using an LLM-guided hierarchical planning strategy and an A* search-based selection process.
This repository provides:
- The official codebase for CoSTA*.
- Scripts to generate and optimize toolpaths for multi-turn image editing.
Try out CoSTA* online: Live Demo
We provide a benchmark dataset with 121 images for testing CoSTA*, containing image-only and text+image tasks.
📂 Dataset: Huggingface Dataset
✅ Hierarchical Planning – Uses LLMs to decompose a task into a subtask tree which is used for constructing the final Tool Subgraph.
✅ Optimized Tool Selection – A* search is applied on the Tool Subgraph for cost-efficient, high-quality pathfinding.
✅ Multimodal Support – Switches between text and image modalities for enhanced editing.
✅ Quality Evaluation via VLM – Automatically assesses tool outputs to estimate the actual quality before progressing further.
✅ Adaptive Retry Mechanism – If the output doesn’t meet the quality threshold, it is retried with updated hyperparameters.
✅ Balancing Cost vs. Quality – A* search does not just minimize cost but also optimizes quality, allowing users to adjust α (alpha) to control cost vs. quality trade-off.
✅ Supports 24 AI Tools – Integrates YOLO, GroundingDINO, Stable Diffusion, CLIP, SAM, DALL-E, and more.
git clone https://github.com/tianyi-lab/CoSTAR.git
cd CoSTAR
Ensure you have Python 3.8+ and install dependencies (most other dependencies are auto-installed when models are run):
pip install -r requirements.txt
The required pre-trained model checkpoints must be downloaded from Google Drive and placed in the checkpoints/
folder. The link to download the checkpoints is provided in checkpoints/checkpoints.txt
.
Note: The API keys for OpenAI and StabilityAI need to be set in the run.py file before executing. To execute CoSTA*, run:
python run.py --image path/to/image.png --prompt "Edit this image" --output output.json --output_image final.png --alpha 0
Example:
python run.py --image inputs/sample.jpg --prompt "Replace the cat with a dog and expand the image" --output Tree.json --output_image final_output.png --alpha 0
--image
: Path to input image.--prompt
: Instruction for editing.--output
: Path to save generated subtask tree.--output_image
: Path to save the final output.--alpha
: Cost-quality trade-off parameter.
The main functions in the following scripts need to be uncommented, and the paths, hyperparameters, and API keys must be modified before execution.
Modify subtask_tree.py
by providing the input image path and prompt, then run:
python subtask_tree.py
Modify tool_subgraph.py
to use the generated Tree.json
, then execute:
python tool_subgraph.py
Modify astar_search.py
with updated paths and hyperparameters, then run:
python astar_search.py
A step-by-step live example can be found in Demo.ipynb
, which provides an interactive Jupyter Notebook for understanding the workflow.
CoSTAR/
├── checkpoints/
│ ├── checkpoints.txt
├── configs/
│ ├── tools.yaml
├── inputs/
│ ├── 40.jpeg
├── outputs/
│ ├── final.png
├── prompts/
│ ├── 40.txt
├── requirements/
│ ├── craft.txt
│ ├── deblurgan.txt
│ ├── easyocr.txt
│ ├── google_cloud.txt
│ ├── groundingdino.txt
│ ├── magicbrush.txt
│ ├── realesrgan.txt
│ ├── sam.txt
│ ├── stability.txt
│ ├── yolo.txt
├── results/
│ ├── final.png
│ ├── img1.png
│ ├── img2.png
│ ├── img3.png
│ ├── img4.png
│ ├── img5.png
├── tools/
│ ├── dalleimage.py
│ ├── groundingdino.py
│ ├── sam.py
│ ├── stabilityoutpaint.py
│ ├── yolov7.py
│ └── ...
├── .gitignore
├── LICENSE
├── README.md
├── Demo.ipynb
├── run.py
├── subtask_tree.py
├── tool_subgraph.py
├── astar_search.py
If you find this work useful, please cite our paper:
@misc{gupta2025costaastcostsensitivetoolpathagent,
title={CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing},
author={Advait Gupta and NandaKiran Velaga and Dang Nguyen and Tianyi Zhou},
year={2025},
eprint={2503.10613},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.10613},
}