Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt and test improvements, cost tracking #237

Merged
merged 32 commits into from
Apr 22, 2024

Conversation

nerfZael
Copy link
Contributor

@nerfZael nerfZael commented Apr 18, 2024

Closes: #236
Changes:

  • Cost tracking both in CLI and benchmarks
  • --cache option (default False) to cache LLM requests (will save costs when developing)
  • --max-rounds option to change max number of rounds (default 100 in CLI, 50 in tests)
  • Initial user chat message contains address and network (and agents no longer have it in persona)
  • Many prompt improvements
  • User proxy persona is now a proxy for user instead of user (it never accepted acting as a user)
  • Removed verifier and user proxy can now terminate
  • New user proxy prompt minimizes "stuck in loop" issue (improves run time)
  • Replaced swap tool with bulk swap
  • Less ambiguous tests
  • Shorter test display name in benchmarks
  • Added complex research and swap tests
  • Update autogen (supports project keys now)

@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 18, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 18, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 18, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 18, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 19, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 19, 2024
@nerfZael nerfZael changed the title [WIP] Cost tracking, prompt improvements, CLI options [WIP] Prompt improvements, Test improvements, Cost tracking, CLI options Apr 19, 2024
@nerfZael nerfZael changed the title [WIP] Prompt improvements, Test improvements, Cost tracking, CLI options [WIP] Prompt improvements, Test improvements, Cost tracking Apr 19, 2024
@nerfZael nerfZael changed the title [WIP] Prompt improvements, Test improvements, Cost tracking [WIP] Prompt and test improvements, cost tracking Apr 19, 2024
Copy link

github-actions bot commented Apr 19, 2024

Finished benchmarks
Download artifacts

Test Run Summary

  • Run from: ./autotx/tests/agents/token/send
  • Base path: autotx/tests/agents/token/send/test_send.py::
  • Iterations: 5
  • Total Cost: $1.71
  • Total Success Rate (%): ${\color{none} \LARGE \texttt {100.00} \large \texttt {} }$

Detailed Results

Test Name Success Rate (%) Passes Fails Avg Time Avg Cost
test_send_erc20 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 20s $0.04
test_send_erc20_parallel ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 36s $0.06
test_send_eth_multiple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 36s $0.04
test_send_native ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 16s $0.04
test_send_native_sequential ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 29s $0.04

Total run time: 11.45 minutes

@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 19, 2024
@nerfZael
Copy link
Contributor Author

/workflows/benchmarks agents/token/test_swap.py,agents/token/test_swap_and_send.py

Copy link

github-actions bot commented Apr 19, 2024

Finished benchmarks
Download artifacts

Test Run Summary

  • Run from: ./autotx/tests/agents/token/test_swap.py,./autotx/tests/agents/token/test_swap_and_send.py
  • Base path: autotx/tests/agents/token/test_swap
  • Iterations: 5
  • Total Cost: $3.12
  • Total Success Rate (%): ${\color{none} \LARGE \texttt {100.00} \large \texttt {} }$

Detailed Results

Test Name Success Rate (%) Passes Fails Avg Time Avg Cost
.py::test_swap_complex_1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 21s $0.05
.py::test_swap_complex_2 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 29s $0.05
.py::test_swap_multiple_1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 20s $0.05
.py::test_swap_multiple_2 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 21s $0.05
.py::test_swap_native ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 19s $0.05
.py::test_swap_triple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 25s $0.05
.py::test_swap_with_non_default_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 20s $0.05
_and_send.py::test_send_and_swap_complex ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 30s $0.05
_and_send.py::test_send_and_swap_simple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 19s $0.05
_and_send.py::test_swap_and_send_complex ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 30s $0.06
_and_send.py::test_swap_and_send_simple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 22s $0.05

Total run time: 21.34 minutes

@nerfZael
Copy link
Contributor Author

/workflows/benchmarks agents/token/research

Copy link

github-actions bot commented Apr 19, 2024

Finished benchmarks
Download artifacts

Test Run Summary

  • Run from: ./autotx/tests/agents/token/research
  • Base path: autotx/tests/agents/token/research/test_research
  • Iterations: 5
  • Total Cost: $14.31
  • Total Success Rate (%): ${\color{red} \LARGE \texttt {80.00} \large \texttt { (80/-13)} }$

Detailed Results

Test Name Success Rate (%) Passes Fails Avg Time Avg Cost
.py::test_get_token_exchanges ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 36s $0.17
.py::test_get_top_5_memecoins ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 32s $0.17
.py::test_get_top_5_memecoins_in_optimism ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 36s $0.17
.py::test_get_top_5_most_traded_tokens_from_l1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 31s $0.17
.py::test_get_top_5_tokens_from_base ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 32s $0.18
.py::test_price_change_information ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 18s $0.17
_and_swap.py::test_research_and_swap_many_tokens_subjective_complex ${\color{red} \large \texttt {0} \normalsize \texttt {(-33)} }$ ${\color{red} \large \texttt {0}}$ ${\color{red} \large \texttt {5}}$ 1.93m $0.17
_and_swap.py::test_research_and_swap_many_tokens_subjective_simple ${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$ ${\color{red} \large \texttt {0}}$ ${\color{red} \large \texttt {5}}$ 1.21m $0.17
_and_swap.py::test_research_and_swap_meme_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 1.44m $0.17
_and_swap.py::test_research_swap_and_send_governance_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {5}}$ ${\color{none} \large \texttt {0}}$ 34s $0.17

Total run time: 41.17 minutes

@nerfZael
Copy link
Contributor Author

/workflows/benchmarks agents/token/research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple 10

Copy link

github-actions bot commented Apr 19, 2024

Finished benchmarks
Download artifacts

Test Run Summary

  • Run from: ./autotx/tests/agents/token/research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple
  • Iterations: 10
  • Total Cost: $4.53
  • Total Success Rate (%): ${\color{red} \LARGE \texttt {70.00} \large \texttt { (70/-30)} }$

Detailed Results

Test Name Success Rate (%) Passes Fails Avg Time Avg Cost
test_research_and_swap_many_tokens_subjective_simple ${\color{red} \large \texttt {70} \normalsize \texttt {(-30)} }$ ${\color{red} \large \texttt {7}}$ ${\color{red} \large \texttt {3}}$ 1.81m $0.45

Total run time: 18.07 minutes

@nerfZael nerfZael changed the title [WIP] Prompt and test improvements, cost tracking Prompt and test improvements, cost tracking Apr 19, 2024
@nerfZael nerfZael requested review from cbrzn and dOrgJelli and removed request for cbrzn April 19, 2024 22:13
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 20, 2024
@agentcoinorg agentcoinorg deleted a comment from github-actions bot Apr 20, 2024
@nerfZael
Copy link
Contributor Author

/workflows/benchmarks agents/token 1

Copy link

github-actions bot commented Apr 22, 2024

Finished benchmarks
Download artifacts

Test Run Summary

  • Run from: ./autotx/tests/agents/token
  • Base path: autotx/tests/agents/token/
  • Iterations: 1
  • Total Cost: $3.59
  • Total Success Rate (%): ${\color{red} \LARGE \texttt {92.31} \large \texttt { (92/-5)} }$

Detailed Results

Test Name Success Rate (%) Passes Fails Avg Time Avg Cost
research/test_research.py::test_get_token_exchanges ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 36s $0.11
research/test_research.py::test_get_top_5_memecoins ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 30s $0.17
research/test_research.py::test_get_top_5_memecoins_in_optimism ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 25s $0.20
research/test_research.py::test_get_top_5_most_traded_tokens_from_l1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 35s $0.18
research/test_research.py::test_get_top_5_tokens_from_base ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 49s $0.17
research/test_research.py::test_price_change_information ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 18s $0.08
research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_complex ${\color{red} \large \texttt {0} \normalsize \texttt {(-33)} }$ ${\color{red} \large \texttt {0}}$ ${\color{red} \large \texttt {1}}$ 2.32m $0.77
research/test_research_and_swap.py::test_research_and_swap_many_tokens_subjective_simple ${\color{red} \large \texttt {0} \normalsize \texttt {(-100)} }$ ${\color{red} \large \texttt {0}}$ ${\color{red} \large \texttt {1}}$ 1.17m $0.33
research/test_research_and_swap.py::test_research_and_swap_meme_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 27s $0.20
research/test_research_and_swap.py::test_research_swap_and_send_governance_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 41s $0.24
send/test_send.py::test_send_erc20 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 24s $0.06
send/test_send.py::test_send_erc20_parallel ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 24s $0.06
send/test_send.py::test_send_eth_multiple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 29s $0.07
send/test_send.py::test_send_native ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 19s $0.06
send/test_send.py::test_send_native_sequential ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 28s $0.11
test_swap.py::test_swap_complex_1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 24s $0.08
test_swap.py::test_swap_complex_2 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 33s $0.08
test_swap.py::test_swap_multiple_1 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 24s $0.06
test_swap.py::test_swap_multiple_2 ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 48s $0.06
test_swap.py::test_swap_native ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 31s $0.05
test_swap.py::test_swap_triple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 32s $0.06
test_swap.py::test_swap_with_non_default_token ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 16s $0.05
test_swap_and_send.py::test_send_and_swap_complex ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 30s $0.10
test_swap_and_send.py::test_send_and_swap_simple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 25s $0.07
test_swap_and_send.py::test_swap_and_send_complex ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 35s $0.10
test_swap_and_send.py::test_swap_and_send_simple ${\color{none} \large \texttt {100} \normalsize \texttt {} }$ ${\color{none} \large \texttt {1}}$ ${\color{none} \large \texttt {0}}$ 22s $0.08

Total run time: 15.22 minutes

@nerfZael nerfZael merged commit 417ce46 into main Apr 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using project API key gives warnings
2 participants