Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): Prompt engineering to remind o1 to generate a patch #4807

Merged
merged 36 commits into from
Nov 8, 2024

Conversation

AlexCuadron
Copy link
Contributor

@AlexCuadron AlexCuadron commented Nov 6, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • [] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

O1 is very forgetful, it usually stops coding as soon as it replicates the issue.
Modified the environment reminder so that o1 remembers to output only one action and to only press finish whenever it has generated a patch.


Link of any specific issues this addresses

AlexCuadron and others added 25 commits October 15, 2024 17:31
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…AI#4408)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@enyst enyst requested a review from xingyaoww November 6, 2024 23:41
@enyst
Copy link
Collaborator

enyst commented Nov 6, 2024

It would be good if @xingyaoww can take a look at it.

Currently, I believe we don't have function calling with o1. That might change, and then function calling allows multiple actions. Maybe that's okay, we just may need to adjust the other part (the finish part) in the prompt for function calling.

@AlexCuadron
Copy link
Contributor Author

It would be good if @xingyaoww can take a look at it.

Currently, I believe we don't have function calling with o1. That might change, and then function calling allows multiple actions. Maybe that's okay, we just may need to adjust the other part (the finish part) in the prompt for function calls.

FC is not available from the API side, the biggest issue here is forgetting to generate a patch. Most of the git diffs generated by o1-prev or o1-mini are just reproduce_issue.py

… variable

1. Added SWE_BENCH_RUN environment variable in run_infer.py
2. Modified codeact_agent.py to only show SWE Bench specific instructions when SWE_BENCH_RUN is true
@AlexCuadron
Copy link
Contributor Author

Fixed! PTAL @xingyaoww

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm giving it a second thought, i'm actually wondering if we can do this inside run_infer.py -- Can we just add this as a suffix in the initial instruction?

The reason is that the environment reminder will likely go away as i got this #4711 working -- then everything will be organized as "function calling" code

@AlexCuadron
Copy link
Contributor Author

Fixed! PTAL, @xingyaoww

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nit - but overall LGTM!

@AlexCuadron
Copy link
Contributor Author

Fixed! PTAL again :D @xingyaoww

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@xingyaoww xingyaoww enabled auto-merge (squash) November 8, 2024 03:03
@xingyaoww xingyaoww merged commit a6810fa into All-Hands-AI:main Nov 8, 2024
8 checks passed
@xingyaoww
Copy link
Collaborator

@AlexCuadron After giving it more thought - i think maybe we should revert or revise this -- basically the agent SHOULD NOT be aware of the patch directly, instead, they should simply interact with the environment to edit files & let us know when it finished - and we will grab a patch from git diff.

<IMPORTANT>
- You MUST generate only one action per turn!
- A patch is a set of changes to the source code of the codebase that you are given
- You MUST generate a patch that attempts to fix the issue described in the <pr_description>
</IMPORTANT>

@xingyaoww
Copy link
Collaborator

Can you share more info on how O1's behavior will be when this is not provided? I'd assume we maybe able to tweak other parts of the instruction to get it working instead of explicitly mention "patch"

@enyst
Copy link
Collaborator

enyst commented Nov 8, 2024

Maybe we can rephrase it as "MUST solve the task".

@xingyaoww
Copy link
Collaborator

@enyst I think it ultimately comes down to the eventual effectiveness -- "MUST solve the task" might not work very well of o1 :( - I guess we can wait for @AlexCuadron's testing in terms of what works and what doesn't, and we can figure out from there

@AlexCuadron
Copy link
Contributor Author

AlexCuadron commented Nov 8, 2024

@xiangyue9607 @enyst O1 is not very good at instruction following, it needs a reminder at the end. Otherwise, it thinks that after finishing reproducing the issue its task is done and it generates a <finish></finish>
But I can try to add a reminder to produce meaningful changes to the codebase before submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants