Add Qwen2.5-14B-Instruct #16

ethal0n · 2024-10-18T14:17:29Z

Hi! Thanks for your benchmark, I use it all the time. Can you please add Qwen/Qwen2.5-14B-Instruct?

According to their tests this model has almost the best ratio - size/performance

I saw that you added models of the 7b, 32b, 72b families, but 14b according to the authors' tests is only 5% weaker than 32b and 7% weaker than 72b, while it is smaller than them by 2.3 and 5 times, respectively. I would like to see your independent tests, especially on Zebra!

Thank you!

ethal0n · 2024-11-08T12:47:01Z

added Qwen2.5-14B-Instruct (acc. 20.6), Qwen2.5-Coder-14B-Instruct (acc. 14.2), Qwen2.5-Coder-32B-Instruct-AWQ (acc. 21.5)

╒══════════════════════════════════╤════════╤══════════╤══════════╤══════════════╤═══════════════════╤═══════════════════╤════════════╤═════════════╤═════════════════╤═══════════════╕
│              Model               │  Mode  │  N_Mode  │  N_Size  │  Puzzle Acc  │  Easy Puzzle Acc  │  Hard Puzzle Acc  │  Cell Acc  │  No answer  │  Total Puzzles  │  Reason Lens  │
╞══════════════════════════════════╪════════╪══════════╪══════════╪══════════════╪═══════════════════╪═══════════════════╪════════════╪═════════════╪═════════════════╪═══════════════╡
│      o1-preview-2024-09-12       │ greedy │  single  │    1     │     71.4     │       98.57       │       60.83       │   75.14    │     0.3     │      1000       │    1565.88    │
│     o1-preview-2024-09-12-v2     │ greedy │  single  │    1     │     70.4     │       98.21       │       59.58       │   74.18    │     0.4     │      1000       │    1559.71    │
│      o1-mini-2024-09-12-v3       │ greedy │  single  │    1     │     59.7     │       86.07       │       49.44       │   70.32    │      1      │      1000       │    1166.38    │
│      o1-mini-2024-09-12-v2       │ greedy │  single  │    1     │     56.8     │       82.86       │       46.67       │   69.87    │     1.3     │      1000       │    1164.95    │
│        o1-mini-2024-09-12        │ greedy │  single  │    1     │     52.6     │       87.14       │       39.17       │   52.29    │     0.8     │      1000       │    993.28     │
│    claude-3-5-sonnet-20241022    │ greedy │  single  │    1     │     36.2     │       91.07       │       14.86       │   54.27    │      0      │      1000       │    861.18     │
│    claude-3-5-sonnet-20240620    │ greedy │  single  │    1     │     33.4     │       87.5        │       12.36       │   54.34    │      0      │      1000       │    1141.94    │
│ Llama-3.1-405B-Inst-fp8@together │ greedy │  single  │    1     │     32.6     │       87.14       │       11.39       │    45.8    │    12.5     │      1000       │    314.66     │
│        gpt-4o-2024-08-06         │ greedy │  single  │    1     │     31.7     │       84.64       │       11.11       │   50.34    │     3.6     │      1000       │    1106.51    │
│     gemini-1.5-pro-exp-0827      │ greedy │  single  │    1     │     30.5     │       79.64       │       11.39       │   50.84    │     0.8     │      1000       │    1594.47    │
│  Llama-3.1-405B-Inst@sambanova   │ greedy │  single  │    1     │     30.1     │       84.64       │       8.89        │   39.06    │    24.7     │      1000       │    2001.12    │
│    chatgpt-4o-latest-24-09-07    │ greedy │  single  │    1     │     29.9     │       81.43       │       9.86        │   48.83    │     4.2     │      1000       │    1539.99    │
│         Mistral-Large-2          │ greedy │  single  │    1     │      29      │       80.36       │       9.03        │   47.64    │     1.7     │      1000       │    1592.39    │
│      gpt-4-turbo-2024-04-09      │ greedy │  single  │    1     │     28.4     │       80.71       │       8.06        │    47.9    │     0.1     │      1000       │    1148.46    │
│        gpt-4o-2024-05-13         │ greedy │  single  │    1     │     28.2     │       77.86       │       8.89        │   38.72    │    19.3     │      1000       │    1643.51    │
│            gpt-4-0314            │ greedy │  single  │    1     │     27.1     │       77.14       │       7.64        │   47.43    │     0.2     │      1000       │    1203.17    │
│      claude-3-opus-20240229      │ greedy │  single  │    1     │      27      │       78.21       │       7.08        │   48.91    │      0      │      1000       │    855.72     │
│       Qwen2.5-72B-Instruct       │ greedy │  single  │    1     │     26.6     │       76.43       │       7.22        │   40.92    │    11.9     │      1000       │    1795.9     │
│       Qwen2.5-32B-Instruct       │ greedy │  single  │    1     │     26.1     │       77.5        │       6.11        │   43.39    │     6.3     │      1000       │    1333.07    │
│     gemini-1.5-pro-exp-0801      │ greedy │  single  │    1     │     25.2     │       72.5        │       6.81        │    48.5    │      0      │      1000       │    1389.75    │
│    gemini-1.5-flash-exp-0827     │ greedy │  single  │    1     │      25      │       70.71       │       7.22        │   43.56    │     8.5     │      1000       │    1705.11    │
│  Llama-3.1-405B-Inst@hyperbolic  │ greedy │  single  │    1     │      25      │       66.67       │       15.38       │   46.62    │    6.25     │       16        │    1517.13    │
│   Meta-Llama-3.1-70B-Instruct    │ greedy │  single  │    1     │     24.9     │       73.57       │       5.97        │   27.98    │     43      │      1000       │    1483.68    │
│      deepseek-v2-chat-0628       │ greedy │  single  │    1     │     22.7     │       68.57       │       4.86        │   42.46    │     5.2     │      1000       │    1260.23    │
│        deepseek-v2.5-0908        │ greedy │  single  │    1     │     22.1     │       68.21       │       4.17        │   38.01    │    12.7     │      1000       │    1294.46    │
│  Qwen2.5-Coder-32B-Instruct-AWQ  │ greedy │  single  │    1     │     21.5     │       64.64       │       4.72        │   39.03    │    10.2     │      1000       │     744.9     │
│        Qwen2-72B-Instruct        │ greedy │  single  │    1     │     21.4     │       63.93       │       4.86        │   38.32    │    10.2     │      1000       │    1813.82    │
│      deepseek-v2-coder-0614      │ greedy │  single  │    1     │     21.1     │       64.64       │       4.17        │   41.58    │     4.9     │      1000       │    1324.55    │
│       Qwen2.5-14B-Instruct       │ greedy │  single  │    1     │     20.6     │       61.07       │       4.86        │   38.71    │    10.9     │      1000       │    662.52     │
│      deepseek-v2-coder-0724      │ greedy │  single  │    1     │     20.5     │       61.79       │       4.44        │   42.35    │     3.4     │      1000       │    1230.63    │
│      gpt-4o-mini-2024-07-18      │ greedy │  single  │    1     │     20.1     │       62.5        │       3.61        │   41.26    │     0.1     │      1000       │    943.52     │
│          gemini-1.5-pro          │ greedy │  single  │    1     │     19.4     │       55.71       │       5.28        │   44.59    │     0.8     │      1000       │    1336.17    │
│         gemini-1.5-flash         │ greedy │  single  │    1     │     19.4     │       59.29       │       3.89        │   31.77    │    22.7     │      1000       │    1538.18    │
│         yi-large-preview         │ greedy │  single  │    1     │     18.9     │       58.93       │       3.33        │   42.61    │     1.4     │      1000       │    833.36     │
│             yi-large             │ greedy │  single  │    1     │     18.8     │       58.21       │       3.47        │   39.83    │     1.8     │      1000       │    757.01     │
│     claude-3-sonnet-20240229     │ greedy │  single  │    1     │     18.7     │       58.93       │       3.06        │   43.66    │      0      │      1000       │    1095.37    │
│    claude-3-5-haiku-20241022     │ greedy │  single  │    1     │     18.7     │       57.86       │       3.47        │   43.22    │     0.1     │      1000       │    660.91     │
│    Meta-Llama-3-70B-Instruct     │ greedy │  single  │    1     │     16.8     │       52.86       │       2.78        │   42.31    │     0.2     │      1000       │    809.95     │
│            Athene-70B            │ greedy │  single  │    1     │     16.7     │       52.5        │       2.78        │   32.98    │    21.1     │      1000       │    391.19     │
│          gemma-2-27b-it          │ greedy │  single  │    1     │     16.3     │       50.71       │       2.92        │   41.18    │     1.1     │      1000       │    1014.56    │
│     claude-3-haiku-20240307      │ greedy │  single  │    1     │     14.3     │       47.86       │       1.25        │   37.87    │     0.1     │      1000       │    1015.06    │
│    Qwen2.5-Coder-14B-Instruct    │ greedy │  single  │    1     │     14.2     │       46.79       │       1.53        │    28.9    │    21.9     │      1000       │    1257.81    │
│          command-r-plus          │ greedy │  single  │    1     │     13.9     │       44.64       │       1.94        │   39.01    │     0.2     │      1000       │    810.53     │
│        reka-core-20240501        │ greedy │  single  │    1     │      13      │       43.21       │       1.25        │   33.88    │      4      │      1000       │    1078.29    │
│    Meta-Llama-3.1-8B-Instruct    │ greedy │  single  │    1     │     12.8     │       43.57       │       0.83        │   13.68    │    61.5     │      1000       │    1043.9     │
│          gemma-2-9b-it           │ greedy │  single  │    1     │     12.8     │       41.79       │       1.53        │   36.79    │      0      │      1000       │    849.84     │
│       Qwen2.5-7B-Instruct        │ greedy │  single  │    1     │      12      │       38.93       │       1.53        │   30.67    │     9.5     │      1000       │    850.93     │
│     Meta-Llama-3-8B-Instruct     │ greedy │  single  │    1     │     11.9     │       40.71       │       0.69        │    23.7    │    29.2     │      1000       │    1216.4     │
│    Mistral-Nemo-Instruct-2407    │ greedy │  single  │    1     │     11.8     │       38.93       │       1.25        │   34.93    │     1.6     │      1000       │    925.88     │
│      Phi-3-mini-4k-instruct      │ greedy │  single  │    1     │     11.6     │       38.21       │       1.25        │    13.5    │     59      │      1000       │    790.29     │
│         Yi-1.5-34B-Chat          │ greedy │  single  │    1     │     11.5     │       37.5        │       1.39        │   32.73    │     4.4     │      1000       │    869.65     │
│        gpt-3.5-turbo-0125        │ greedy │  single  │    1     │     10.1     │       33.57       │       0.97        │   33.06    │     0.1     │      1000       │    820.66     │
│            command-r             │ greedy │  single  │    1     │     9.9      │       32.14       │       1.25        │   32.66    │     1.5     │      1000       │    1005.17    │
│       reka-flash-20240226        │ greedy │  single  │    1     │     9.3      │       30.71       │       0.97        │   25.67    │    18.7     │      1000       │    1074.8     │
│        mathstral-7B-v0.1         │ greedy │  single  │    1     │      9       │        30         │       0.83        │   20.42    │     36      │      1000       │    1148.16    │
│    Mixtral-8x7B-Instruct-v0.1    │ greedy │  single  │    1     │     8.7      │       28.93       │       0.83        │   26.47    │    20.3     │      1000       │    1177.21    │
│        Qwen2-7B-Instruct         │ greedy │  single  │    1     │     8.4      │       29.29       │       0.28        │   22.06    │    24.4     │      1000       │    1473.23    │
│  Llama-3.2-3B-Instruct@together  │ greedy │  single  │    1     │     7.4      │       25.71       │       0.28        │   13.14    │    54.5     │      1000       │    963.47     │
│      Phi-3.5-mini-instruct       │ greedy │  single  │    1     │     6.4      │       21.79       │       0.42        │    5.98    │    80.6     │      1000       │    718.43     │
│       Qwen2.5-3B-Instruct        │ greedy │  single  │    1     │     4.8      │       17.14       │         0         │   11.44    │    56.7     │      1000       │    906.58     │
│          gemma-2-2b-it           │ greedy │  single  │    1     │     4.2      │       14.29       │       0.28        │    9.97    │    57.2     │      1000       │    1032.89    │
│          Yi-1.5-9B-Chat          │ greedy │  single  │    1     │     2.3      │       8.21        │         0         │    7.53    │    11.3     │      1000       │    1592.6     │
╘══════════════════════════════════╧════════╧══════════╧══════════╧══════════════╧═══════════════════╧═══════════════════╧════════════╧═════════════╧═════════════════╧═══════════════╛

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2.5-14B-Instruct #16

Add Qwen2.5-14B-Instruct #16

ethal0n commented Oct 18, 2024

ethal0n commented Nov 8, 2024 •

edited

Loading

Add Qwen2.5-14B-Instruct #16

Add Qwen2.5-14B-Instruct #16

Comments

ethal0n commented Oct 18, 2024

ethal0n commented Nov 8, 2024 • edited Loading

ethal0n commented Nov 8, 2024 •

edited

Loading