Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Together.ai Runner #215

Merged
merged 2 commits into from
Aug 30, 2024
Merged

Together.ai Runner #215

merged 2 commits into from
Aug 30, 2024

Conversation

wongjingping
Copy link
Collaborator

Changes

  • Add runner for together.ai's API. Quite similar to bedrock_runner.py
  • Extended generate_prompt.py to take in json files, where we have a list of messages with role and content. This returns a list of messages (dict) for the messages parameter in the openai create chat completions api.
  • Add table aliases as a different field table_aliases in the main dataframe, as cot_instructions is a more general and experimental field (e.g. for reasoning etc). We get prompt_together.json to read off the table_aliases field instead of using cot_instructions.

Testing

Ran the full evaluation over meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo as follows:

python3 main.py \
  -db postgres \
  -q data/instruct_basic_postgres.csv data/instruct_advanced_postgres.csv data/questions_gen_postgres.csv \
  -o results/together_llama_70b_basic.csv results/together_llama_70b_advanced.csv results/together_llama_70b_v1.csv \
  -g together \
  -f prompts/prompt_together.json \
  --cot_table_alias prealias \
  -m "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \
  -c 0 \
  -p 10

Got the following results:

Question File Accuracy
instruct_basic_postgres.csv 90.00%
instruct_advanced_postgres.csv 67.19%
questions_gen_postgres.csv 80.00%

The results are slightly better than the previous evaluation on bedrock, see #116.

Sppedwise it was extremely fast (at 10 concurrent requests):

  • median latency on v1 ws 0.99s
  • p99 latency on v1 was 2.43s
Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/instruct_basic_postgres.csv
Correct so far: 36/40 (90.00%): 100%|████████████████████████████████████████████████████████████| 40/40 [00:04<00:00,  8.09it/s]
                                   correct  error_db_exec
query_category                                           
basic_group_order_limit              1.000            0.0
basic_join_date_group_order_limit    0.750            0.0
basic_join_distinct                  0.875            0.0
basic_join_group_order_limit         0.875            0.0
basic_left_join                      1.000            0.0
Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/instruct_advanced_postgres.csv
Correct so far: 14/18 (77.78%):  39%|███████████████████████▍                                    | 25/64 [00:05<00:06,  6.03it/s]/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
Correct so far: 43/64 (67.19%): 100%|████████████████████████████████████████████████████████████| 64/64 [00:14<00:00,  4.39it/s]
                              correct  error_db_exec
query_category                                      
instructions_cte_join           0.750          0.000
instructions_cte_window         0.500          0.000
instructions_date_join          0.500          0.125
instructions_string_matching    0.875          0.000
keywords_aggregate              0.875          0.000
keywords_ratio                  0.625          0.125
Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/questions_gen_postgres.csv
Correct so far: 168/210 (80.00%): 100%|████████████████████████████████████████████████████████| 210/210 [00:23<00:00,  9.05it/s]
                 correct  error_db_exec
query_category                         
date_functions  0.600000       0.171429
group_by        0.914286       0.057143
instruct        0.857143       0.028571
order_by        0.828571       0.028571
ratio           0.828571       0.085714
table_join      0.771429       0.114286

Copy link
Member

@rishsriv rishsriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this! And wow, those are some rapid speeds ⚡️

@rishsriv rishsriv merged commit 9330b5c into main Aug 30, 2024
2 checks passed
@rishsriv rishsriv deleted the jp/tgt branch August 30, 2024 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants