diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 0000000..11e257a --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,24 @@ +cff-version: 1.2.0 +message: "Please cite our paper as below." +authors: +- family-names: "Cui" + given-names: "Yiming" + orcid: "https://orcid.org/0000-0002-2452-375X" +- family-names: "Yao" + given-names: "Xin" +title: "Chinese Mixtral" +version: 1.0 +date-released: 2024-03-05 +url: "https://github.com/ymcui/Chinese-Mixtral" +preferred-citation: + type: article + authors: + - family-names: "Cui" + given-names: "Yiming" + orcid: "https://orcid.org/0000-0002-2452-375X" + - family-names: "Yao" + given-names: "Xin" + title: "Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral" + journal: "arXiv pre-print" + year: 2024 + url: "https://arxiv.org/abs/2403.01851" \ No newline at end of file diff --git a/README.md b/README.md index d85eb92..ed9412b 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,8 @@ 本项目基于Mistral.ai发布的[Mixtral模型](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)进行开发,该模型使用了稀疏混合专家模型(Sparse MoE)架构。本项目利用大规模中文无标注数据进行了中文增量训练,得到了**中文Mixtral**基础模型,并且进一步通过指令精调,得到了**中文Mixtral-Instruct**指令模型。该模型原生支持**32K上下文(实测可达128K)**,能够有效地处理长文本,同时在数学推理、代码生成等方面获得了显著性能提升。使用llama.cpp进行量化推理时,最低只需16G内存(或显存)。 +**技术报告**:[[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851) + #### 本项目主要内容 - 🚀 开源中文Mixtral基础模型,该模型在[Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)的基础上进行了中文增量训练 @@ -29,7 +31,9 @@ ## 新闻 -**[2024/01/29] 🚀 正式发布Chinese-Mixtral(基座模型),Chinese-Mixtral-Instruct(指令/chat模型)。详情查看:[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)** +**[2024/03/05] 开源模型训练和精调代码,发布技术报告。详情查看:[📚v1.1版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)** + +[2024/01/29] 🚀 正式发布Chinese-Mixtral(基座模型),Chinese-Mixtral-Instruct(指令/chat模型)。详情查看:[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0) ## 内容导引 @@ -246,11 +250,12 @@ Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流 ## 引用 ```tex -@misc{chinese-mixtral, - title={Chinese Mixtral}, - author={Cui, Yiming and Yao, Xin}, - howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}}, - year={2024} +@article{chinese-mixtral, + title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, + author={Cui, Yiming and Yao, Xin}, + journal={arXiv preprint arXiv:2403.01851}, + url={https://arxiv.org/abs/2403.01851}, + year={2024} } ``` diff --git a/README_EN.md b/README_EN.md index 09b578d..5e49faf 100644 --- a/README_EN.md +++ b/README_EN.md @@ -14,6 +14,8 @@ This project is developed based on the [Mixtral model](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) released by Mistral.ai, which utilizes a Sparse Mixture of Experts (MoE) architecture. This project involves the use of large-scale Chinese unannotated data for incremental training in Chinese, resulting in the **Chinese Mixtral** base model. Further fine-tuning with instructions led to the creation of the **Chinese Mixtral-Instruct** instruction model. This model natively supports a **32K context (tested up to 128K)** and is capable of effectively processing long texts, while also showing significant performance improvements in areas like mathematical reasoning and code generation. When using llama.cpp for quantized inference, a minimum of only 16GB of memory (or VRAM) is required. +**Paper**: [[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851) + #### Main Contents of This Project - 🚀 Open-sourced Chinese Mixtral base model, incrementally trained in Chinese on top of [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) @@ -29,7 +31,9 @@ This project is developed based on the [Mixtral model](https://huggingface.co/mi ## News -**[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 Version 1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)** +**[2024/03/05] Release pre-training and fine-tuning scripts. Technical reports are also available. See: [📚 v1.1 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)** + +[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 v1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0) ## Content Guide @@ -246,11 +250,12 @@ Question 3: Is the downstream ecosystem of Mixtral supported? ## Citation ```tex -@misc{chinese-mixtral, - title={Chinese Mixtral}, - author={Cui, Yiming and Yao, Xin}, - howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}}, - year={2024} +@article{chinese-mixtral, + title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, + author={Cui, Yiming and Yao, Xin}, + journal={arXiv preprint arXiv:2403.01851}, + url={https://arxiv.org/abs/2403.01851}, + year={2024} } ```