Skip to content
@SWE-bench

SWE-bench

Organization for maintaining SWE-bench and related projects

📣 New: Meet mini, the 100 line AI agent that still gets 65% on SWE-bench verified!

SWE-bench    SWE-agent    SWE-smith    mini-SWE-agent    SWE-ReX    sb-cli

Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Stanford University and Princeton University.

HuggingFace Slack YouTube


This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

  • SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
  • SWE-agent, a system that automatically solves GitHub issues using an LM agent.
  • SWE-smith, a toolkit for generating SWE training data at scale.
  • mini, an AI agent written in just 100 lines of code that scores 65% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

  • SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
  • sb-cli, a command line interface for running evaluations on the cloud.
  • Mirror clones for the SWE-bench and SWE-smith repositories are available here and here.

Pinned Loading

  1. SWE-bench SWE-bench Public

    SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

    Python 3.2k 570

  2. SWE-smith SWE-smith Public

    Scaling Data for SWE-agents

    Python 317 43

  3. experiments experiments Public

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    Shell 194 216

  4. sb-cli sb-cli Public

    Run SWE-bench evaluations remotely

    Python 31

Repositories

Showing 9 of 9 repositories

Top languages

Loading…