Together.ai Runner #215

wongjingping · 2024-08-30T04:17:02Z

Changes

Add runner for together.ai's API. Quite similar to bedrock_runner.py
Extended generate_prompt.py to take in json files, where we have a list of messages with role and content. This returns a list of messages (dict) for the messages parameter in the openai create chat completions api.
Add table aliases as a different field table_aliases in the main dataframe, as cot_instructions is a more general and experimental field (e.g. for reasoning etc). We get prompt_together.json to read off the table_aliases field instead of using cot_instructions.

Testing

Ran the full evaluation over meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo as follows:

python3 main.py \
  -db postgres \
  -q data/instruct_basic_postgres.csv data/instruct_advanced_postgres.csv data/questions_gen_postgres.csv \
  -o results/together_llama_70b_basic.csv results/together_llama_70b_advanced.csv results/together_llama_70b_v1.csv \
  -g together \
  -f prompts/prompt_together.json \
  --cot_table_alias prealias \
  -m "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \
  -c 0 \
  -p 10

Got the following results:

Question File	Accuracy
instruct_basic_postgres.csv	90.00%
instruct_advanced_postgres.csv	67.19%
questions_gen_postgres.csv	80.00%

The results are slightly better than the previous evaluation on bedrock, see #116.

Sppedwise it was extremely fast (at 10 concurrent requests):

median latency on v1 ws 0.99s
p99 latency on v1 was 2.43s

Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/instruct_basic_postgres.csv
Correct so far: 36/40 (90.00%): 100%|████████████████████████████████████████████████████████████| 40/40 [00:04<00:00,  8.09it/s]
                                   correct  error_db_exec
query_category                                           
basic_group_order_limit              1.000            0.0
basic_join_date_group_order_limit    0.750            0.0
basic_join_distinct                  0.875            0.0
basic_join_group_order_limit         0.875            0.0
basic_left_join                      1.000            0.0
Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/instruct_advanced_postgres.csv
Correct so far: 14/18 (77.78%):  39%|███████████████████████▍                                    | 25/64 [00:05<00:06,  6.03it/s]/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:584: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gen.fillna(-99999, inplace=True)
/home/ubuntu/sql-eval/eval/eval.py:585: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df_gold.fillna(-99999, inplace=True)
Correct so far: 43/64 (67.19%): 100%|████████████████████████████████████████████████████████████| 64/64 [00:14<00:00,  4.39it/s]
                              correct  error_db_exec
query_category                                      
instructions_cte_join           0.750          0.000
instructions_cte_window         0.500          0.000
instructions_date_join          0.500          0.125
instructions_string_matching    0.875          0.000
keywords_aggregate              0.875          0.000
keywords_ratio                  0.625          0.125
Using prompt file prompts/prompt_together.json
Preparing questions...
Using all question(s) from data/questions_gen_postgres.csv
Correct so far: 168/210 (80.00%): 100%|████████████████████████████████████████████████████████| 210/210 [00:23<00:00,  9.05it/s]
                 correct  error_db_exec
query_category                         
date_functions  0.600000       0.171429
group_by        0.914286       0.057143
instruct        0.857143       0.028571
order_by        0.828571       0.028571
ratio           0.828571       0.085714
table_join      0.771429       0.114286

lint

rishsriv

Thank you for adding this! And wow, those are some rapid speeds ⚡️

Together.ai Runner

2765925

wongjingping requested review from rishsriv and wendy-aw August 30, 2024 04:17

add bedrock instructions

aac73ee

lint

rishsriv approved these changes Aug 30, 2024

View reviewed changes

rishsriv merged commit 9330b5c into main Aug 30, 2024
2 checks passed

rishsriv deleted the jp/tgt branch August 30, 2024 04:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Together.ai Runner #215

Together.ai Runner #215

Uh oh!

wongjingping commented Aug 30, 2024

Uh oh!

rishsriv left a comment

Uh oh!

Uh oh!

Uh oh!

Together.ai Runner #215

Together.ai Runner #215

Uh oh!

Conversation

wongjingping commented Aug 30, 2024

Changes

Testing

Uh oh!

rishsriv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!