diff --git a/README.md b/README.md index 8cf3e0d..c359892 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,27 @@ # Evolutionary Optimization of Model Merging Recipes -This is an official repository of [Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/) to reproduce the results. +This is an official repository of [Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/TODO) to reproduce the results. ## Model Zoo ### LLM -| Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (Average ↑) | -| :-- | --: | --: | -| [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 | -| [WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 | -| [Abel-7B-002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 | -| [(Ours) EvoLLM-v1-JP-7B-A](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-7B-A) | 52.4 | 69.0 | -| [(Ours) EvoLLM-v1-JP-7B](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-7B) | 52.0 | **70.5** | -| [(Ours) EvoLLM-v1-JP-10B](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-10B) | **55.6** | 68.2 | +| Id. | Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (Average ↑) | +| :--: | :-- | --: | --: | +| 1 | [Shisa Gamma 7B v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 | +| 2 | [WizardMath 7B V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 | +| 3 | [Abel 7B 002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 | +| 4 | [Arithmo2 Mistral 7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) | 24.0 | 56.4 | +| 5 | [(Ours) EvoLLM-v1-JP-7B-A](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-7B-A) | **52.4** | **69.0** | +| 6 | [(Ours) EvoLLM-v1-JP-7B](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-7B) | **52.0** | **70.5** | +| 7 | [(Ours) EvoLLM-v1-JP-10B](https://huggingface.co/SakanaAI/EvoLLM-v1-JP-10B) | **55.6** | **68.2** | ### VLM -| Model | Ja-VG-VQA-500 (Ja-R-L ↑) | JaVLM-Bench-In-the-Wild (Ja-R-L ↑) | +| Model | JA-VG-VQA-500 (ROUGE-L ↑) | JA-VLM-Bench-In-the-Wild (ROUGE-L ↑) | | :-- | --: | --: | | [LLaVA-1.6-Mistral-7B](https://llava-vl.github.io/blog/2024-01-30-llava-next/) | 14.32 | 41.10 | -| [JSVLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | - | 40.50 | +| [Japanese Stable VLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | - | 40.50 | | [Heron BLIP Japanese StableLM Base 7B llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k)\* | 8.73 | 27.37 | | [(Ours) EvoVLM-v1-JP-7B](https://huggingface.co/SakanaAI/EvoVLM-v1-JP-7B) | **19.70** | **51.25** | diff --git a/configs/llm/arithmo2-mistral-7b.yaml b/configs/llm/arithmo2-mistral-7b.yaml new file mode 100644 index 0000000..3cec4ce --- /dev/null +++ b/configs/llm/arithmo2-mistral-7b.yaml @@ -0,0 +1,10 @@ +model: + target: evofactory.CausalLMWithvLLM + params: + model_path: upaya07/Arithmo2-Mistral-7B + model_kwargs: + dtype: bfloat16 + template: ja-alpaca-cot + +eval: + target: evofactory.eval.JaMGSM \ No newline at end of file