-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): Prompt engineering to remind o1 to generate a patch #4807
Conversation
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…AI#4408) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…-AI#4412) Co-authored-by: Robert Brennan <[email protected]>
It would be good if @xingyaoww can take a look at it. Currently, I believe we don't have function calling with o1. That might change, and then function calling allows multiple actions. Maybe that's okay, we just may need to adjust the other part (the finish part) in the prompt for function calling. |
FC is not available from the API side, the biggest issue here is forgetting to generate a patch. Most of the |
… variable 1. Added SWE_BENCH_RUN environment variable in run_infer.py 2. Modified codeact_agent.py to only show SWE Bench specific instructions when SWE_BENCH_RUN is true
Fixed! PTAL @xingyaoww |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm giving it a second thought, i'm actually wondering if we can do this inside run_infer.py
-- Can we just add this as a suffix in the initial instruction?
The reason is that the environment reminder will likely go away as i got this #4711 working -- then everything will be organized as "function calling" code
Fixed! PTAL, @xingyaoww |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nit - but overall LGTM!
Fixed! PTAL again :D @xingyaoww |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
@AlexCuadron After giving it more thought - i think maybe we should revert or revise this -- basically the agent SHOULD NOT be aware of the patch directly, instead, they should simply interact with the environment to edit files & let us know when it finished - and we will grab a patch from
|
Can you share more info on how O1's behavior will be when this is not provided? I'd assume we maybe able to tweak other parts of the instruction to get it working instead of explicitly mention "patch" |
Maybe we can rephrase it as "MUST solve the task". |
@enyst I think it ultimately comes down to the eventual effectiveness -- "MUST solve the task" might not work very well of o1 :( - I guess we can wait for @AlexCuadron's testing in terms of what works and what doesn't, and we can figure out from there |
@xiangyue9607 @enyst O1 is not very good at instruction following, it needs a reminder at the end. Otherwise, it thinks that after finishing reproducing the issue its task is done and it generates a |
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
O1 is very forgetful, it usually stops coding as soon as it replicates the issue.
Modified the environment reminder so that o1 remembers to output only one action and to only press finish whenever it has generated a patch.
Link of any specific issues this addresses