Prompt and test improvements, cost tracking #237

nerfZael · 2024-04-18T18:42:20Z

Closes: #236
Changes:

Cost tracking both in CLI and benchmarks
--cache option (default False) to cache LLM requests (will save costs when developing)
--max-rounds option to change max number of rounds (default 100 in CLI, 50 in tests)
Initial user chat message contains address and network (and agents no longer have it in persona)
Many prompt improvements
User proxy persona is now a proxy for user instead of user (it never accepted acting as a user)
Removed verifier and user proxy can now terminate
New user proxy prompt minimizes "stuck in loop" issue (improves run time)
Replaced swap tool with bulk swap
Less ambiguous tests
Shorter test display name in benchmarks
Added complex research and swap tests
Update autogen (supports project keys now)

…ary to do it in agents, prompt improvements

…mpt-improvements

github-actions · 2024-04-19T20:02:03Z

Finished benchmarks
Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/send
Base path: autotx/tests/agents/token/send/test_send.py::
Iterations: 5
Total Cost: $1.71
Total Success Rate (%): ${\color{none} \LARGE \texttt {100.00} \large \texttt {} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_send_erc20`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	20s	$0.04
`test_send_erc20_parallel`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	36s	$0.06
`test_send_eth_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	36s	$0.04
`test_send_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	16s	$0.04
`test_send_native_sequential`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	29s	$0.04

Total run time: 11.45 minutes

nerfZael · 2024-04-19T20:20:34Z

/workflows/benchmarks agents/token/test_swap.py,agents/token/test_swap_and_send.py

github-actions · 2024-04-19T20:20:46Z

Finished benchmarks
Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/test_swap.py,./autotx/tests/agents/token/test_swap_and_send.py
Base path: autotx/tests/agents/token/test_swap
Iterations: 5
Total Cost: $3.12
Total Success Rate (%): ${\color{none} \LARGE \texttt {100.00} \large \texttt {} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`.py::test_swap_complex_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	21s	$0.05
`.py::test_swap_complex_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	29s	$0.05
`.py::test_swap_multiple_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	20s	$0.05
`.py::test_swap_multiple_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	21s	$0.05
`.py::test_swap_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	19s	$0.05
`.py::test_swap_triple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	25s	$0.05
`.py::test_swap_with_non_default_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	20s	$0.05
`_and_send.py::test_send_and_swap_complex`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	30s	$0.05
`_and_send.py::test_send_and_swap_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	19s	$0.05
`_and_send.py::test_swap_and_send_complex`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	30s	$0.06
`_and_send.py::test_swap_and_send_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	22s	$0.05

Total run time: 21.34 minutes

nerfZael · 2024-04-19T20:50:01Z

/workflows/benchmarks agents/token/research

github-actions · 2024-04-19T20:50:15Z

Finished benchmarks
Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research
Base path: autotx/tests/agents/token/research/test_research
Iterations: 5
Total Cost: $14.31
Total Success Rate (%): ${\color{red} \LARGE \texttt {80.00} \large \texttt { (80/-13)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`.py::test_get_token_exchanges`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	36s	$0.17
`.py::test_get_top_5_memecoins`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	32s	$0.17
`.py::test_get_top_5_memecoins_in_optimism`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	36s	$0.17
`.py::test_get_top_5_most_traded_tokens_from_l1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	31s	$0.17
`.py::test_get_top_5_tokens_from_base`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	32s	$0.18
`.py::test_price_change_information`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	18s	$0.17
`_and_swap.py::test_research_and_swap_many_tokens_subjective_complex`	${\color{red} \large \texttt {0} \normalsize \texttt {(-33)} }$	${\color{red} \large \texttt {0}}$	${\color{red} \large \texttt {5}}$	1.93m	$0.17
`_and_swap.py::test_research_and_swap_many_tokens_subjective_simple`	${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$	${\color{red} \large \texttt {0}}$	${\color{red} \large \texttt {5}}$	1.21m	$0.17
`_and_swap.py::test_research_and_swap_meme_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	1.44m	$0.17
`_and_swap.py::test_research_swap_and_send_governance_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {5}}$	${\color{none} \large \texttt {0}}$	34s	$0.17

Total run time: 41.17 minutes

nerfZael · 2024-04-19T21:44:26Z

/workflows/benchmarks agents/token/research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple 10

github-actions · 2024-04-19T21:44:37Z

Finished benchmarks
Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token/research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple
Iterations: 10
Total Cost: $4.53
Total Success Rate (%): ${\color{red} \LARGE \texttt {70.00} \large \texttt { (70/-30)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`test_research_and_swap_many_tokens_subjective_simple`	${\color{red} \large \texttt {70} \normalsize \texttt {(-30)} }$	${\color{red} \large \texttt {7}}$	${\color{red} \large \texttt {3}}$	1.81m	$0.45

Total run time: 18.07 minutes

autotx/AutoTx.py

autotx/agents/ResearchTokensAgent.py

autotx/agents/SwapTokensAgent.py

Co-authored-by: Cesar Brazon <[email protected]>

…lywrap/AutoTx into nerfzael/cost-and-prompt-improvements

nerfZael · 2024-04-22T15:21:41Z

/workflows/benchmarks agents/token 1

github-actions · 2024-04-22T15:21:54Z

Finished benchmarks
Download artifacts

Test Run Summary

Run from: ./autotx/tests/agents/token
Base path: autotx/tests/agents/token/
Iterations: 1
Total Cost: $3.59
Total Success Rate (%): ${\color{red} \LARGE \texttt {92.31} \large \texttt { (92/-5)} }$

Detailed Results

Test Name	Success Rate (%)	Passes	Fails	Avg Time	Avg Cost
`research/test_research.py::test_get_token_exchanges`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	36s	$0.11
`research/test_research.py::test_get_top_5_memecoins`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	30s	$0.17
`research/test_research.py::test_get_top_5_memecoins_in_optimism`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	25s	$0.20
`research/test_research.py::test_get_top_5_most_traded_tokens_from_l1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	35s	$0.18
`research/test_research.py::test_get_top_5_tokens_from_base`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	49s	$0.17
`research/test_research.py::test_price_change_information`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	18s	$0.08
`research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_complex`	${\color{red} \large \texttt {0} \normalsize \texttt {(-33)} }$	${\color{red} \large \texttt {0}}$	${\color{red} \large \texttt {1}}$	2.32m	$0.77
`research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple`	${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$	${\color{red} \large \texttt {0}}$	${\color{red} \large \texttt {1}}$	1.17m	$0.33
`research/test_research_and_swap.py::test_research_and_swap_meme_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	27s	$0.20
`research/test_research_and_swap.py::test_research_swap_and_send_governance_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	41s	$0.24
`send/test_send.py::test_send_erc20`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	24s	$0.06
`send/test_send.py::test_send_erc20_parallel`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	24s	$0.06
`send/test_send.py::test_send_eth_multiple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	29s	$0.07
`send/test_send.py::test_send_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	19s	$0.06
`send/test_send.py::test_send_native_sequential`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	28s	$0.11
`test_swap.py::test_swap_complex_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	24s	$0.08
`test_swap.py::test_swap_complex_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	33s	$0.08
`test_swap.py::test_swap_multiple_1`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	24s	$0.06
`test_swap.py::test_swap_multiple_2`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	48s	$0.06
`test_swap.py::test_swap_native`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	31s	$0.05
`test_swap.py::test_swap_triple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	32s	$0.06
`test_swap.py::test_swap_with_non_default_token`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	16s	$0.05
`test_swap_and_send.py::test_send_and_swap_complex`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	30s	$0.10
`test_swap_and_send.py::test_send_and_swap_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	25s	$0.07
`test_swap_and_send.py::test_swap_and_send_complex`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	35s	$0.10
`test_swap_and_send.py::test_swap_and_send_simple`	${\color{none} \large \texttt {100} \normalsize \texttt {} }$	${\color{none} \large \texttt {1}}$	${\color{none} \large \texttt {0}}$	22s	$0.08

Total run time: 15.22 minutes

nerfZael added 9 commits April 18, 2024 20:05

implemented cost tracking and prettier benchmark path display

4f83d9b

specifiying address and network in inital message so no longer necess…

b17f461

…ary to do it in agents, prompt improvements

removed auto_tx prefix from tests

ad24377

Merge remote-tracking branch 'origin/main' into nerfzael/cost-and-pro…

d1916e5

…mpt-improvements

added max rounds and no cache options

c3e7a27

added simple swap many test

193bc02

type fixes

4d0e49e

debug prints

e7524bf

debug print

b85a446

agentcoinorg deleted a comment from github-actions bot Apr 18, 2024

nerfZael added 3 commits April 18, 2024 22:41

removed debug prints

fd42ebb

using turbo preview model in cli because it tracks costs

c26e98c

not printing cost if 0

0411468

agentcoinorg deleted a comment from github-actions bot Apr 18, 2024

nerfZael added 6 commits April 19, 2024 13:20

updated autogen

6e06758

removed verifier as it is not needed and slows things down

f5e6c18

using costs from newest version of autogen

87f0cd4

removed ambiguity from research tests

2f91019

updated lock file

4638c03

change no cache to cache, cache is really only useful for dev

45a25b7

agentcoinorg deleted a comment from github-actions bot Apr 19, 2024

nerfZael added 2 commits April 19, 2024 14:07

fixed issue with researcher making up categories

da389fc

made complex research test properly strict

c337de1

nerfZael changed the title ~~[WIP] Cost tracking, prompt improvements, CLI options~~ [WIP] Prompt improvements, Test improvements, Cost tracking, CLI options Apr 19, 2024

nerfZael changed the title ~~[WIP] Prompt improvements, Test improvements, Cost tracking, CLI options~~ [WIP] Prompt improvements, Test improvements, Cost tracking Apr 19, 2024

nerfZael changed the title ~~[WIP] Prompt improvements, Test improvements, Cost tracking~~ [WIP] Prompt and test improvements, cost tracking Apr 19, 2024

removed ambiguity from test

43204ef

agentcoinorg deleted a comment from github-actions bot Apr 19, 2024

fixed types

c4c4705

nerfZael added 2 commits April 19, 2024 23:29

clarifier can now terminate

d30f47c

fixed cost tracking for benchmarks

a13557b

nerfZael changed the title ~~[WIP] Prompt and test improvements, cost tracking~~ Prompt and test improvements, cost tracking Apr 19, 2024

nerfZael requested review from cbrzn and dOrgJelli and removed request for cbrzn April 19, 2024 22:13

agentcoinorg deleted a comment from github-actions bot Apr 20, 2024

cbrzn reviewed Apr 22, 2024

View reviewed changes

autotx/AutoTx.py Show resolved Hide resolved

cbrzn reviewed Apr 22, 2024

View reviewed changes

autotx/AutoTx.py Outdated Show resolved Hide resolved

autotx/AutoTx.py Outdated Show resolved Hide resolved

autotx/agents/ResearchTokensAgent.py Outdated Show resolved Hide resolved

autotx/agents/SwapTokensAgent.py Outdated Show resolved Hide resolved

nerfZael and others added 6 commits April 22, 2024 14:41

updated autogen to 0.2.26

8644d0a

Update autotx/agents/ResearchTokensAgent.py

08959de

Co-authored-by: Cesar Brazon <[email protected]>

surfacing swap errors to user and prompt improvements

29abec0

Merge branch 'nerfzael/cost-and-prompt-improvements' of github.com:po…

072c756

…lywrap/AutoTx into nerfzael/cost-and-prompt-improvements

explicit type

1a97444

simplified config

3df4108

cbrzn approved these changes Apr 22, 2024

View reviewed changes

nerfZael merged commit 417ce46 into main Apr 22, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt and test improvements, cost tracking #237

Prompt and test improvements, cost tracking #237

nerfZael commented Apr 18, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024 •

edited

Loading

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 •

edited

Loading

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 •

edited

Loading

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 •

edited

Loading

nerfZael commented Apr 22, 2024

github-actions bot commented Apr 22, 2024 •

edited

Loading

Prompt and test improvements, cost tracking #237

Prompt and test improvements, cost tracking #237

Conversation

nerfZael commented Apr 18, 2024 • edited Loading

github-actions bot commented Apr 19, 2024 • edited Loading

Test Run Summary

Detailed Results

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 • edited Loading

Test Run Summary

Detailed Results

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 • edited Loading

Test Run Summary

Detailed Results

nerfZael commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 • edited Loading

Test Run Summary

Detailed Results

nerfZael commented Apr 22, 2024

github-actions bot commented Apr 22, 2024 • edited Loading

Test Run Summary

Detailed Results

nerfZael commented Apr 18, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024 •

edited

Loading

github-actions bot commented Apr 22, 2024 •

edited

Loading