Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I populate the FAIL_TO_PASS and PASS_TO_PASS fields? #287

Open
ErsjanKeri opened this issue Jan 15, 2025 · 2 comments
Open

How can I populate the FAIL_TO_PASS and PASS_TO_PASS fields? #287

ErsjanKeri opened this issue Jan 15, 2025 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@ErsjanKeri
Copy link

Describe the issue

I have been trying to recreate the dataset, but for other repositories.
I am testing out first if I can manage to do it for fastapi, since it is a big python repository.

So what I have done so far is:

  1. I have fetched the pull requests utilising the /collect/ submodule, it has generated for me a jsonl file, lets call the location ./data/fastapi-task-instances.jsonl.all
  2. Afterwards I went to /versioning/ submodule, calling the get_versions.py file on this dataset and now having the file: ./data/fastapi-task-instances.json
  3. The last dataset containing all task instances and its proper version field added to the json I have converted to a hugging face dataset saved locally, ./data/fastapi_hf containing only a train split. The current keys are:
  "citation": "",
  "description": "",
  "features": {
    "repo": {
      "dtype": "string",
      "_type": "Value"
    },
    "pull_number": {
      "dtype": "int64",
      "_type": "Value"
    },
    "instance_id": {
      "dtype": "string",
      "_type": "Value"
    },
    "issue_numbers": {
      "feature": {
        "dtype": "string",
        "_type": "Value"
      },
      "_type": "Sequence"
    },
    "base_commit": {
      "dtype": "string",
      "_type": "Value"
    },
    "patch": {
      "dtype": "string",
      "_type": "Value"
    },
    "test_patch": {
      "dtype": "string",
      "_type": "Value"
    },
    "problem_statement": {
      "dtype": "string",
      "_type": "Value"
    },
    "hints_text": {
      "dtype": "string",
      "_type": "Value"
    },
    "created_at": {
      "dtype": "string",
      "_type": "Value"
    },
    "version": {
      "dtype": "string",
      "_type": "Value"
    }
  },
  "homepage": "",
  "license": ""
}
  1. Afterwards in /inference/make_datasets/ submodule I have executed the create_text_dataset.py on this file, now adding the FAIL_TO_PASS and PASS_TO_PASS columns into the dataset too. I have saved the current dataset as: "./data/fastapi-text-ds", this one contains a train and validation split
  2. Afterwards I executed run_api.py on ./data/fastapi-text-ds, which has generated me a new dataset containing the responses from the LLM, it is saved under: "./output/fastapi.jsonl" which contains the model_patch our main goal.

So the final step now would be to run: python3 -m swebench.harness.run_evaluation \ --dataset_name ./data/fastapi_hf \ --predictions_path ./outputs/output/fastapi.jsonl \ --max_workers 4 \ --run_id first_pilot_run --split train

So assuming all the steps have been correct (please let me know if I have missed any step), here I am facing two errors, first one I solved it was simply that dataset_name was not expected to be loaded from disk in the utils.py, but what the issue I am facing is that and I noticed lately is that the fields of the main dataset containing the PASS_TO_PASS and FAIL_TO_PASS are only empty strings.

error coming from test_spec.py line ~300

    def _from_json_or_obj(key: str) -> Any:
        """If key points to string, load with json"""
        if isinstance(instance[key], str):
            return json.loads(instance[key])
        return instance[key]

-> happening because instance["PASS_TO_PASS"] is equal with "", I believe it should not be the case.

I am not sure, where should the evaluation of the test_patch has taken place? In the paper it is described why this process happens but not exactly at which stage/part it should happen.
Would be very grateful of some support on this matter, if I have missed any steps and how to finally run the evaluation on the dataset I have currently collected.
Thank you in advance

Suggest an improvement to documentation

No response

@ErsjanKeri ErsjanKeri added the documentation Improvements or additions to documentation label Jan 15, 2025
@brad-kenstler
Copy link

I couldn't find any logic for populating this field anywhere. Also curious!!!

@ErsjanKeri
Copy link
Author

Is the repository not maintained anymore? because how to populate PASS_TO_PASS and FAIL_TOP_PASS is crucial for recreating the experiments on the research paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants