SWE-smith is a toolkit for training SWE-agents. You can:
- Turn any Github repository into a SWE-gym.
- Create unlimited tasks (e.g., file localization, program repair, SWE-bench) for that repo.
- Train an LM to become a better SWE (SWE-agent-LM-32B).
If you're interested in turning a GitHub repository into a SWE-gym, install the package from source.
Tip
SWE-smith requires Docker to create execution environments. SWE-smith was developed and tested on Ubuntu 22.04.4 LTS. We do not plan on supporting Windows or MacOS.
You can then build a dataset for the repository by...
- Creating an environment
- Synthesizing task instances
- Keep tasks that break 1+ unit tests
- Generating issue text for your tasks
Training SWE-agent's using the SWE-smith dataset is super simple.
from swesmith.profiles import registry
from datasets import load_dataset
ds = load_dataset("SWE-bench/SWE-smith", split="train") # Loads all 52k task instances
for task in ds:
rp = registry.get_from_inst(task) # Get the RepoProfile for the task
container = rp.get_container(task) # Returns pointer to a Docker container with the task initialized
"""TODO: Train!"""
SWE-smith has been used to
- Fine-tune Qwen 2.5 Coder into SWE-agent-LM-32B (A +32% jump on SWE-bench Verified!) using SWE-agent [Tutorial]
- Perform GRPO style reinforcement learning using SkyRL
- 52k Task Instances
- SWE-agent-LM-32B; 40.2% pass@1 on SWE-bench Verified!
- 26k SWE-agent Trajectories, including the 5k SWE-agent-LM-32B was trained on.
- 250+ Environments, one Docker image per repo represented in SWE-smith.
And there's more coming!
We're actively working on several follow ups! Check out the Contributing Guide for more.
Contact Person: John Yang, Kilian Lieret (Email: [email protected])
CC-BY-4.0. Check LICENSE
for more information.
@misc{yang2025swesmith,
title={SWE-smith: Scaling Data for Software Engineering Agents},
author={John Yang and Kilian Leret and Carlos E. Jimenez and Alexander Wettig and Kabir Khandpur and Yanzhe Zhang and Binyuan Hui and Ofir Press and Ludwig Schmidt and Diyi Yang},
year={2025},
eprint={2504.21798},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.21798},
}