Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response #33

Open
dmelcer9 opened this issue Jul 26, 2024 · 3 comments · May be fixed by #49
Open

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response #33

dmelcer9 opened this issue Jul 26, 2024 · 3 comments · May be fixed by #49
Assignees
Labels
bug Something isn't working

Comments

@dmelcer9
Copy link

EvalPlus version

v0_1_0_hf

Output of running ls ~/.cache/bigcodebench

BigCodeBench-v0.1.0_hf.jsonl

Task ID of the programming task

BigCodeBench/211, BigCodeBench/215, probably some others as well

The original test

(All tests)
mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_requests_get.return_value = mock_response

Your proposed new test

mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_response.status_code = 200
mock_requests_get.return_value = mock_response

Description

The LLM sometimes (reasonably!) generates code like:

    if r.status_code != 200:
        print("Error: Failed to download file from URL.")
        return None

   (Rest of code solves task correctly)

But fails the test

Other context

No response

@dmelcer9 dmelcer9 added the bug Something isn't working label Jul 26, 2024
@terryyz
Copy link
Collaborator

terryyz commented Jul 26, 2024

Thanks @dmelcer9! It makes sense :) We didn't think about this when developing the initial tasks. We will incorporate this change in the next dataset release.

@hvaara
Copy link
Contributor

hvaara commented Sep 14, 2024

@dmelcer9 which model did you use? I'd like to verify resolution in #49.

@dmelcer9
Copy link
Author

Not 100% sure but I believe this was with Starcoder2-15b, temperature was somewhere between 0.7 and 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants