The trajectories/
folder is the default location that experiment results (invocations of run.py
) will be written to.
At a high level, the experiments folder is organized in the following manner:
trajectories
├── <user 1> 👩💻
│ ├── <experiment 1> 🧪
│ │ ├── all_preds.jsonl
│ │ ├── args.yaml
│ │ ├── *.html (Webpage Files)
│ │ └── *.traj (Trajectories)
│ └── <experiment 2> 🧪
│ ├── all_preds.jsonl
│ ├── args.yaml
│ ├── *.html (Webpage Files)
│ └── *.traj (Trajectories)
├── <user 2> 👨💻
│ ├── <experiment 1> 🧪
│ │ └── ...
│ └── <experiment 2> 🧪
│ └── ...
...
Where every experiment follows the pattern trajectories/<user name>/<experiment name>
. The <user name>
is automatically inferred from your system, and the experiment name
is inferred from the arguments of the run.py
.
Each call to run.py
produces a single trajectories/<user name>/<experiment name>
folder containing the following assets:
all_preds.jsonl
: A single file containing all of the predictions generated for the experiment (1 prediction per task instance), where each line is formatted as:
{
"instance_id": "<Unique task instance ID>",
"model_patch": "<.patch file content string>",
"model_name_or_path": "<Model name here (Inferred from experiment configs)>",
}
args.yaml
: A summary of the configurations for the experiment run.<instance_id>.traj
: A.json
formatted file containing the (thought, action, observation) turns generated by SWE-agent towards solving<instance_id>
.<instance_id>.html
: An.html
single webpage render of the trajectory, which can be directly opened in the browser for easier viewing of the trajectory.
⚠️ Notes
- Evaluation is not completed by
run.py
, it is a separate step.all_preds.jsonl
can be referenced directly intoevaluation/run_eval.sh
to run evaluation.