(feat): Prompt engineering to remind o1 to generate a patch #4807

AlexCuadron · 2024-11-06T23:38:12Z

End-user friendly description of the problem this fixes or functionality that this introduces

[] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

O1 is very forgetful, it usually stops coding as soon as it replicates the issue.
Modified the environment reminder so that o1 remembers to output only one action and to only press finish whenever it has generated a patch.

Link of any specific issues this addresses

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…AI#4408) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…-AI#4412) Co-authored-by: Robert Brennan <[email protected]>

…ilt.

enyst · 2024-11-06T23:42:39Z

It would be good if @xingyaoww can take a look at it.

Currently, I believe we don't have function calling with o1. That might change, and then function calling allows multiple actions. Maybe that's okay, we just may need to adjust the other part (the finish part) in the prompt for function calling.

AlexCuadron · 2024-11-06T23:46:28Z

It would be good if @xingyaoww can take a look at it.

Currently, I believe we don't have function calling with o1. That might change, and then function calling allows multiple actions. Maybe that's okay, we just may need to adjust the other part (the finish part) in the prompt for function calls.

FC is not available from the API side, the biggest issue here is forgetting to generate a patch. Most of the git diffs generated by o1-prev or o1-mini are just reproduce_issue.py

openhands/agenthub/codeact_agent/codeact_agent.py

… variable 1. Added SWE_BENCH_RUN environment variable in run_infer.py 2. Modified codeact_agent.py to only show SWE Bench specific instructions when SWE_BENCH_RUN is true

AlexCuadron · 2024-11-07T06:38:44Z

Fixed! PTAL @xingyaoww

xingyaoww

Hmm giving it a second thought, i'm actually wondering if we can do this inside run_infer.py -- Can we just add this as a suffix in the initial instruction?

The reason is that the environment reminder will likely go away as i got this #4711 working -- then everything will be organized as "function calling" code

AlexCuadron · 2024-11-08T02:05:49Z

Fixed! PTAL, @xingyaoww

xingyaoww

A few nit - but overall LGTM!

evaluation/swe_bench/run_infer.py

openhands/agenthub/codeact_agent/codeact_agent.py

AlexCuadron · 2024-11-08T02:48:28Z

Fixed! PTAL again :D @xingyaoww

xingyaoww

LGTM! Thanks!

xingyaoww · 2024-11-08T15:19:27Z

@AlexCuadron After giving it more thought - i think maybe we should revert or revise this -- basically the agent SHOULD NOT be aware of the patch directly, instead, they should simply interact with the environment to edit files & let us know when it finished - and we will grab a patch from git diff.

<IMPORTANT>
- You MUST generate only one action per turn!
- A patch is a set of changes to the source code of the codebase that you are given
- You MUST generate a patch that attempts to fix the issue described in the <pr_description>
</IMPORTANT>

xingyaoww · 2024-11-08T15:21:17Z

Can you share more info on how O1's behavior will be when this is not provided? I'd assume we maybe able to tweak other parts of the instruction to get it working instead of explicitly mention "patch"

…4807)" This reverts commit a6810fa.

enyst · 2024-11-08T16:09:22Z

Maybe we can rephrase it as "MUST solve the task".

xingyaoww · 2024-11-08T16:13:03Z

@enyst I think it ultimately comes down to the eventual effectiveness -- "MUST solve the task" might not work very well of o1 :( - I guess we can wait for @AlexCuadron's testing in terms of what works and what doesn't, and we can figure out from there

AlexCuadron · 2024-11-08T23:14:01Z

@xiangyue9607 @enyst O1 is not very good at instruction following, it needs a reminder at the end. Otherwise, it thinks that after finishing reproducing the issue its task is done and it generates a <finish></finish>
But I can try to add a reminder to produce meaningful changes to the codebase before submission.

AlexCuadron and others added 25 commits October 15, 2024 17:31

Updated tests

76cdcd1

chore(deps): bump litellm from 1.49.3 to 1.49.4 (All-Hands-AI#4406)

3beaf5c

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore(deps-dev): bump llama-index from 0.11.17 to 0.11.18 (All-Hands-…

c8db8aa

…AI#4408) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore(deps): bump modal from 0.64.181 to 0.64.182 (All-Hands-AI#4407)

308dc62

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

refactor: move get_pairs from memory to shared utils (All-Hands-AI#4411)

158a923

Fix eval output path in case of @ char (All-Hands-AI#4416)

b6a9163

Fix for lockup - create the runtime in a background thread (All-Hands…

8ba531a

…-AI#4412) Co-authored-by: Robert Brennan <[email protected]>

Merge remote-tracking branch 'upstream/main'

87f6870

Merge remote-tracking branch 'upstream/main'

6037e20

Merge remote-tracking branch 'upstream/main'

0c5de4c

Added support to specify the platform on which the image should be bu…

12798fd

…ilt.

Merge remote-tracking branch 'upstream/main'

ef3646f

Merge remote-tracking branch 'upstream/main'

18bdb56

Merge remote-tracking branch 'upstream/main'

7ca0de6

Merge remote-tracking branch 'upstream/main'

5a76cc8

Merge remote-tracking branch 'upstream/main'

4a7ef31

Merge remote-tracking branch 'upstream/main'

32c69af

Merge remote-tracking branch 'upstream/main'

bf8b4c0

Merge remote-tracking branch 'upstream/main'

e284c95

Merge remote-tracking branch 'upstream/main'

619bbf1

Merge remote-tracking branch 'upstream/main'

65ec945

Merge remote-tracking branch 'upstream/main'

d644f45

Merge remote-tracking branch 'upstream/main'

ec94128

prompt engineering to remind o1 to generate a patch

064e4ad

Merge branch 'main' into o1

b05b47f

enyst requested a review from xingyaoww November 6, 2024 23:41

xingyaoww reviewed Nov 7, 2024

View reviewed changes

openhands/agenthub/codeact_agent/codeact_agent.py Outdated Show resolved Hide resolved

Make SWE Bench specific instructions conditional based on environment…

aaba2d4

… variable 1. Added SWE_BENCH_RUN environment variable in run_infer.py 2. Modified codeact_agent.py to only show SWE Bench specific instructions when SWE_BENCH_RUN is true

AlexCuadron force-pushed the o1 branch from ff5e836 to aaba2d4 Compare November 7, 2024 05:08

AlexCuadron added 4 commits November 7, 2024 06:08

Merge branch 'main' into o1

7773fc6

fix

b19fdf1

fix

f4b7066

fix

af947b0

xingyaoww reviewed Nov 7, 2024

View reviewed changes

AlexCuadron added 5 commits November 7, 2024 17:36

fixed according to comments

9cf4927

Merge branch 'main' into o1

f76909e

Update run_infer.py

bc9f635

Update run_infer.py

deb54f7

Merge branch 'main' into o1

e80d170

xingyaoww reviewed Nov 8, 2024

View reviewed changes

evaluation/swe_bench/run_infer.py Outdated Show resolved Hide resolved

openhands/agenthub/codeact_agent/codeact_agent.py Outdated Show resolved Hide resolved

fixed based on comments

1033d61

xingyaoww approved these changes Nov 8, 2024

View reviewed changes

xingyaoww enabled auto-merge (squash) November 8, 2024 03:03

xingyaoww merged commit a6810fa into All-Hands-AI:main Nov 8, 2024
8 checks passed

xingyaoww added a commit that referenced this pull request Nov 8, 2024

Revert "(feat): Prompt engineering to remind o1 to generate a patch (#…

93ac44e

…4807)" This reverts commit a6810fa.

xingyaoww added a commit that referenced this pull request Nov 8, 2024

Revert "(feat): Prompt engineering to remind o1 to generate a patch (#…

eff8490

…4807)" This reverts commit a6810fa.

xingyaoww mentioned this pull request Nov 8, 2024

Revert "(feat): Prompt engineering to remind o1 to generate a patch" #4846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): Prompt engineering to remind o1 to generate a patch #4807

(feat): Prompt engineering to remind o1 to generate a patch #4807

AlexCuadron commented Nov 6, 2024 •

edited

Loading

enyst commented Nov 6, 2024 •

edited

Loading

AlexCuadron commented Nov 6, 2024

AlexCuadron commented Nov 7, 2024

xingyaoww left a comment

AlexCuadron commented Nov 8, 2024

xingyaoww left a comment

AlexCuadron commented Nov 8, 2024

xingyaoww left a comment

xingyaoww commented Nov 8, 2024

xingyaoww commented Nov 8, 2024

enyst commented Nov 8, 2024

xingyaoww commented Nov 8, 2024

AlexCuadron commented Nov 8, 2024 •

edited

Loading

(feat): Prompt engineering to remind o1 to generate a patch #4807

(feat): Prompt engineering to remind o1 to generate a patch #4807

Conversation

AlexCuadron commented Nov 6, 2024 • edited Loading

enyst commented Nov 6, 2024 • edited Loading

AlexCuadron commented Nov 6, 2024

AlexCuadron commented Nov 7, 2024

xingyaoww left a comment

Choose a reason for hiding this comment

AlexCuadron commented Nov 8, 2024

xingyaoww left a comment

Choose a reason for hiding this comment

AlexCuadron commented Nov 8, 2024

xingyaoww left a comment

Choose a reason for hiding this comment

xingyaoww commented Nov 8, 2024

xingyaoww commented Nov 8, 2024

enyst commented Nov 8, 2024

xingyaoww commented Nov 8, 2024

AlexCuadron commented Nov 8, 2024 • edited Loading

AlexCuadron commented Nov 6, 2024 •

edited

Loading

enyst commented Nov 6, 2024 •

edited

Loading

AlexCuadron commented Nov 8, 2024 •

edited

Loading