Openai compatible gauntlet #1017

bmosaicml · 2024-03-08T00:39:25Z

OpenAI run: api-eval-Ik2iMA

| Category   | Benchmark       | Subtask                             |   Accuracy | Number few shot   | Model                         |
|:-----------|:----------------|:------------------------------------|-----------:|:------------------|:------------------------------|
|            | gsm8k           |                                     |   0.482942 | 0-shot            | openai/gpt-3.5-turbo-instruct |
|            | lambada_openai  |                                     |   0.782651 | 0-shot            | openai/gpt-3.5-turbo-instruct |
|            | triviaqa_sm_sub |                                     |   0.727667 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            | jeopardy        | Average                             |   0.553084 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | american_history                    |   0.602906 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | literature                          |   0.714286 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | science                             |   0.434874 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | word_origins                        |   0.372603 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | world_history                       |   0.640751 | 3-shot            | openai/gpt-3.5-turbo-instruct |
|            | arc_challenge   |                                     |   0.687713 | 25-shot           | openai/gpt-3.5-turbo-instruct |
|            | mmlu            | Average                             |   0.713291 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | abstract_algebra                    |   0.47     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | anatomy                             |   0.674074 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | astronomy                           |   0.776316 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | business_ethics                     |   0.79     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | clinical_knowledge                  |   0.750943 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_biology                     |   0.763889 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_chemistry                   |   0.53     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_computer_science            |   0.57     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_mathematics                 |   0.47     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_medicine                    |   0.699422 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | college_physics                     |   0.54902  | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | computer_security                   |   0.81     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | conceptual_physics                  |   0.67234  | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | econometrics                        |   0.570175 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | electrical_engineering              |   0.662069 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | elementary_mathematics              |   0.608466 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | formal_logic                        |   0.642857 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | global_facts                        |   0.48     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_biology                 |   0.809677 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_chemistry               |   0.571429 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_computer_science        |   0.8      | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_european_history        |   0.70303  | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_geography               |   0.818182 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_government_and_politics |   0.906736 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_macroeconomics          |   0.720513 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_mathematics             |   0.507407 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_microeconomics          |   0.785714 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_physics                 |   0.509934 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_psychology              |   0.838532 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_statistics              |   0.564815 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_us_history              |   0.823529 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | high_school_world_history           |   0.763713 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | human_aging                         |   0.7713   | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | human_sexuality                     |   0.847328 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | international_law                   |   0.859504 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | jurisprudence                       |   0.768519 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | logical_fallacies                   |   0.809816 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | machine_learning                    |   0.625    | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | management                          |   0.815534 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | marketing                           |   0.884615 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | medical_genetics                    |   0.88     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | miscellaneous                       |   0.872286 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | moral_disputes                      |   0.710983 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | moral_scenarios                     |   0.436871 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | nutrition                           |   0.761438 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | philosophy                          |   0.713826 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | prehistory                          |   0.783951 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | professional_accounting             |   0.56383  | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | professional_law                    |   0.557366 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | professional_medicine               |   0.768382 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | professional_psychology             |   0.73366  | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | public_relations                    |   0.790909 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | security_studies                    |   0.763265 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | sociology                           |   0.850746 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | us_foreign_policy                   |   0.93     | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | virology                            |   0.662651 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            |                 | world_religions                     |   0.883041 | 5-shot            | openai/gpt-3.5-turbo-instruct |
|            | hellaswag       |                                     |   0.706333 | 10-shot           | openai/gpt-3.5-turbo-instruct |

…lm-foundry into migrate_subclasses_to_foundry

…foundry into openai_compatible_gauntlet

…auntlet

…foundry into openai_compatible_gauntlet

maxisawesome

Looking good, couple questions but are mostly pointing to stuff that looks like WIP work rather than intended to be merged code

maxisawesome · 2024-04-19T21:36:04Z

mcli/mcli-dependent-deployment-eval.yaml

+    early_stopping_criteria:
+    - "\n\n"
+    - "Question:"
+  # - label: mmlu


Should these be uncommented or deleted?

maxisawesome · 2024-04-19T21:36:44Z

mcli/mcli-hf-eval.yaml

  cd llm-foundry/scripts
  composer eval/eval.py /mnt/config/parameters.yaml

 # Mosaic Cloud will use run_name (with a unique suffix) to populate the env var $RUN_NAME
-run_name: mpt-eval
+name: mpt-eval-logging


Is this a temporary change or permanent?

goes for whole file I think?

maxisawesome · 2024-04-19T21:37:28Z

mcli/mcli-openai-eval.yaml

+name: api-eval
+cluster: r1z1  # replace with your cluster here!
+gpu_num: 8 #
+gpu_type: a100_80gb #


Probably temporary?

maxisawesome · 2024-04-19T21:38:22Z

scripts/eval/eval.py

+    if fsdp_config and model_cfg.model.get('load_in_8bit', False):
+        raise ValueError(
+            'The FSDP config block is not supported when loading ' +
+            'Hugging Face models in 8bit.')


This doesn't seem openai gauntlet related but prob should be merged anyway?

maxisawesome · 2024-04-19T21:40:14Z

scripts/eval/yamls/long_context_tasks.yaml

-    name: hotpotqa
-    context_length: 65536
-    section: beginning
-    split: test


Probably don't want to delete these do we?

maxisawesome · 2024-04-19T21:40:51Z

scripts/eval/yamls/openai_eval.yaml

-      model_name: gpt-3.5-turbo
-
-  model_name: openai/davinci
+  model_name: openai/gpt-3.5-turbo-instruct


del old gpt-3.5-turbo throughout the codebase for instruct?

maxisawesome · 2024-04-19T21:41:07Z

setup.py

@@ -114,8 +114,7 @@
 ]

 extra_deps['openai'] = [
-    'openai==1.3.8',
-    'tiktoken==0.4.0',
+    'openai==1.3.8', 'tiktoken==0.4.0', 'google-generativeai'


Should google-generativeai be here or it's own extra_deps category?

bmosaicml and others added 30 commits January 27, 2024 14:51

start

cd18e74

still need to migrate fixtures

1fffbad

Merge branch 'main' into migrate_subclasses_to_foundry

5a6e81c

wip onboarding tests

4aac81e

still workin'

946a4af

still wip

289ca55

maybe done; test out on mcli now

3696f8d

mcli

a20877d

remove calibration error

53da3ea

merge

16b8e32

migration

a90766e

migration

72ce793

Merge branch 'migrate_subclasses_to_foundry' of github.com:mosaicml/l…

667bdec

…lm-foundry into migrate_subclasses_to_foundry

full migration

ceff0c4

precommit

5bb06cc

fix

fe83828

fix pytests

b54a12b

refactor QA

71e8391

update

414153e

restore

a3f5a31

Merge branch 'main' into migrate_subclasses_to_foundry

820069a

add

4a1cd79

Merge branch 'migrate_subclasses_to_foundry' of github.com:mosaicml/l…

d265979

…lm-foundry into migrate_subclasses_to_foundry

wip

6cbaad4

Merge branch 'main' into migrate_subclasses_to_foundry

ddfd7b5

fix

71f77e3

wip

cb3725b

update readme

5135152

Merge branch 'main' into migrate_subclasses_to_foundry

18bae17

final pyright

c6162dd

bmosaicml and others added 19 commits April 4, 2024 15:30

Merge branch 'main' into migrate_subclasses_to_foundry

a60ef1d

Merge branch 'main' into migrate_subclasses_to_foundry

d78d783

Merge branch 'main' into migrate_subclasses_to_foundry

1ddf194

Merge branch 'main' into openai_compatible_gauntlet

78ac2d9

working

34c967b

fix typos

d5aebc8

Merge branch 'migrate_subclasses_to_foundry' of github.com:mosaicml/l…

d7272b1

…lm-foundry into migrate_subclasses_to_foundry

add deprecation warning for code

a5082b0

Merge branch 'migrate_subclasses_to_foundry' of github.com:mosaicml/l…

3c8ac56

…lm-foundry into migrate_subclasses_to_foundry

pyright wip

642ad40

Merge branch 'main' into migrate_subclasses_to_foundry

f30db14

fix pyright

de321b2

fix pyright error again

019c58a

fix pyright

779f490

fix pyright

03f7e91

Merge branch 'openai_compatible_gauntlet' of github.com:mosaicml/llm-…

bb2728b

…foundry into openai_compatible_gauntlet

Merge branch 'migrate_subclasses_to_foundry' into openai_compatible_g…

65f1a3e

…auntlet

Merge branch 'main' into openai_compatible_gauntlet

f493e35

Merge branch 'main' into openai_compatible_gauntlet

e62f584

maxisawesome mentioned this pull request Apr 14, 2024

Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices' #1073

Closed

maxisawesome and others added 8 commits April 19, 2024 20:19

add api key to test

d23aa5f

Merge branch 'main' into openai_compatible_gauntlet

558538b

put openai imports behind typechecking

47a0cb9

correct typecheckign import

21de07a

Merge branch 'openai_compatible_gauntlet' of github.com:mosaicml/llm-…

7c1c32c

…foundry into openai_compatible_gauntlet

linting

5dce21d

fix gemni import errors

619e2ce

fix hf_eval

eec82a1

maxisawesome requested changes Apr 19, 2024

View reviewed changes

linting

6fc714b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openai compatible gauntlet #1017

Openai compatible gauntlet #1017

bmosaicml commented Mar 8, 2024 •

edited

Loading

maxisawesome left a comment

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

maxisawesome Apr 19, 2024

Openai compatible gauntlet #1017

Are you sure you want to change the base?

Openai compatible gauntlet #1017

Conversation

bmosaicml commented Mar 8, 2024 • edited Loading

maxisawesome left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmosaicml commented Mar 8, 2024 •

edited

Loading