You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to recreate the dataset, but for other repositories.
I am testing out first if I can manage to do it for fastapi, since it is a big python repository.
So what I have done so far is:
I have fetched the pull requests utilising the /collect/ submodule, it has generated for me a jsonl file, lets call the location ./data/fastapi-task-instances.jsonl.all
Afterwards I went to /versioning/ submodule, calling the get_versions.py file on this dataset and now having the file: ./data/fastapi-task-instances.json
The last dataset containing all task instances and its proper version field added to the json I have converted to a hugging face dataset saved locally, ./data/fastapi_hf containing only a train split. The current keys are:
Afterwards in /inference/make_datasets/ submodule I have executed the create_text_dataset.py on this file, now adding the FAIL_TO_PASS and PASS_TO_PASS columns into the dataset too. I have saved the current dataset as: "./data/fastapi-text-ds", this one contains a train and validation split
Afterwards I executed run_api.py on ./data/fastapi-text-ds, which has generated me a new dataset containing the responses from the LLM, it is saved under: "./output/fastapi.jsonl" which contains the model_patch our main goal.
So the final step now would be to run: python3 -m swebench.harness.run_evaluation \ --dataset_name ./data/fastapi_hf \ --predictions_path ./outputs/output/fastapi.jsonl \ --max_workers 4 \ --run_id first_pilot_run --split train
So assuming all the steps have been correct (please let me know if I have missed any step), here I am facing two errors, first one I solved it was simply that dataset_name was not expected to be loaded from disk in the utils.py, but what the issue I am facing is that and I noticed lately is that the fields of the main dataset containing the PASS_TO_PASS and FAIL_TO_PASS are only empty strings.
error coming from test_spec.py line ~300
def _from_json_or_obj(key: str) -> Any:
"""If key points to string, load with json"""
if isinstance(instance[key], str):
return json.loads(instance[key])
return instance[key]
-> happening because instance["PASS_TO_PASS"] is equal with "", I believe it should not be the case.
I am not sure, where should the evaluation of the test_patch has taken place? In the paper it is described why this process happens but not exactly at which stage/part it should happen.
Would be very grateful of some support on this matter, if I have missed any steps and how to finally run the evaluation on the dataset I have currently collected.
Thank you in advance
Suggest an improvement to documentation
No response
The text was updated successfully, but these errors were encountered:
Is the repository not maintained anymore? because how to populate PASS_TO_PASS and FAIL_TOP_PASS is crucial for recreating the experiments on the research paper
Describe the issue
I have been trying to recreate the dataset, but for other repositories.
I am testing out first if I can manage to do it for fastapi, since it is a big python repository.
So what I have done so far is:
./data/fastapi-task-instances.jsonl.all
./data/fastapi-task-instances.json
./data/fastapi_hf
containing only a train split. The current keys are:So the final step now would be to run:
python3 -m swebench.harness.run_evaluation \ --dataset_name ./data/fastapi_hf \ --predictions_path ./outputs/output/fastapi.jsonl \ --max_workers 4 \ --run_id first_pilot_run --split train
So assuming all the steps have been correct (please let me know if I have missed any step), here I am facing two errors, first one I solved it was simply that dataset_name was not expected to be loaded from disk in the utils.py, but what the issue I am facing is that and I noticed lately is that the fields of the main dataset containing the PASS_TO_PASS and FAIL_TO_PASS are only empty strings.
error coming from test_spec.py line ~300
-> happening because instance["PASS_TO_PASS"] is equal with "", I believe it should not be the case.
I am not sure, where should the evaluation of the test_patch has taken place? In the paper it is described why this process happens but not exactly at which stage/part it should happen.
Would be very grateful of some support on this matter, if I have missed any steps and how to finally run the evaluation on the dataset I have currently collected.
Thank you in advance
Suggest an improvement to documentation
No response
The text was updated successfully, but these errors were encountered: