Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
iwiwi authored Mar 14, 2024
1 parent 497ec55 commit 2decd09
Showing 1 changed file with 54 additions and 30 deletions.
84 changes: 54 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,87 @@
# Evolutionary Model Merge
# 🐟 Evolutionary Optimization of Model Merging Recipes

This is an official repository of [Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/TODO) to reproduce the results.
🤗 [Models](https://huggingface.co/SakanaAI) | 👀 [Demo](TODO) | 📚 [Paper](TODO) | 📝 [Blog](TODO) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)

## Model Zoo
This repository serves as a central hub for SakanaAI's [Evolutionary Model Merge](TODO) series, showcasing its releases and resources. It includes models and code for reproducing the evaluation presented in our paper. Look forward to more updates and additions coming soon.

### LLM

| Id. | Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (Average ↑) |
| :--: | :-- | --: | --: |
| 1 | [Shisa Gamma 7B v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 |
| 2 | [WizardMath 7B V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 |
| 3 | [Abel 7B 002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 |
| 4 | [Arithmo2 Mistral 7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) | 24.0 | 56.4 |
| 5 | [(Ours) EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | **52.4** | **69.0** |
| 6 | [(Ours) EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | **52.0** | **70.5** |
| 7 | [(Ours) EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | **55.6** | **68.2** |
## Models

### Our Models

| Model | Size | License | Source |
| :-- | --: | :-- | :-- |
| [EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | 7B | Microsoft Research License | [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1), [WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1), [GAIR/Abel-7B-002](https://huggingface.co/GAIR/Abel-7B-002)
| [EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | 10B | Microsoft Research License | EvoLLM-JP-v1-7B, [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) |
| [EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | 7B | Apache 2.0 | [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1), [Arithmo2-Mistral-7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B), [GAIR/Abel-7B-002](https://huggingface.co/GAIR/Abel-7B-002) |
| [EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B) | 7B | Apache 2.0 | [LLaVA-1.6-Mistral-7B](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b), [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1)




### Comparing EvoLLM-JP w/ Source LLMs

For details on the evaluation, please refer to Section 4.1 of the paper.

| Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (avg ↑) |
| :-- | --: | --: |
| [Shisa Gamma 7B v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 |
| [WizardMath 7B V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 |
| [Abel 7B 002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 |
| [Arithmo2 Mistral 7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) | 24.0 | 56.4 |
| [EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | **52.4** | **69.0** |
| [EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | **52.0** | **70.5** |
| [EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | **55.6** | **68.2** |


### Comparing EvoVLM-JP w/ Existing VLMs

For details on the evaluation, please see Section 4.2 of the paper.

### VLM

| Model | JA-VG-VQA-500 (ROUGE-L ↑) | JA-VLM-Bench-In-the-Wild (ROUGE-L ↑) |
| :-- | --: | --: |
| [LLaVA-1.6-Mistral-7B](https://llava-vl.github.io/blog/2024-01-30-llava-next/) | 14.32 | 41.10 |
| [Japanese Stable VLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | - | 40.50 |
| [Heron BLIP Japanese StableLM Base 7B llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k)\* | 8.73 | 27.37 |
| [Japanese Stable VLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | -<sup>*1</sup> | 40.50 |
| [Heron BLIP Japanese StableLM Base 7B llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k) | 8.73<sup>*2</sup> | 27.37<sup>*2</sup> |
| [(Ours) EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B) | **19.70** | **51.25** |

* \* We are checking with the authors to see if this current results are valid.
* \*1: Japanese Stable VLM cannot be evaluated using the VA-VG-VQA-500 dataset because this model has used this dataset for training.
* \*2: We are checking with the authors to see if this current results are valid.


## Installation

### 1. Clone the repo
## Reproducing the Evaluation

### 1. Clone the Repo

```bash
git clone https://github.com/SakanaAI/evolving-merged-models.git
cd evolving-merged-models
```

### 2. Download fastext
### 2. Download fastext Model

We use fastext to detect language for evaluation. Please download lid.176.ftz from [this link](https://fasttext.cc/docs/en/language-identification.html) and set the path as below.
We use fastext to detect language for evaluation. Please download `lid.176.ftz` from [this link](https://fasttext.cc/docs/en/language-identification.html) and place it in your current directory. If you place the file in a directory other than the current directory, specify the path to the file using the `LID176FTZ_PATH` environment variable.

```bash
export LID176FTZ_PATH="path-to-lid.176.ftz"
```

### 3. Install necesarry libraries
### 3. Install Libraries

```bash
pip install -e .
```
We conducted our tests in the following environment: Python Version 3.10.12 and CUDA Version 12.3.
We cannot guarantee that it will work in other environments.

* We tested under the following environment:
* Python Version: 3.10
* CUDA Version: 12.3

## Evaluation
### 4. Run

To launch evaluation, run the following script with a certain config. All configs used for the paper are in `configs`.

```bash
python evaluate.py --config_path {path-to-config}
```


## Acknowledgement

We would like to thank the developers of the source models for their contributions and for making their work available.

0 comments on commit 2decd09

Please sign in to comment.