Skip to content

Releases: hiyouga/LLaMA-Factory

v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini

08 Sep 17:14
Compare
Choose a tag to compare

Congratulations on 30,000 stars 🎉 Follow us at X (twitter)

New features

New models

  • Base models
    • Qwen2-Math (1.5B/7B/72B) 📄🔢
    • Yi-Coder (1.5B/9B) 📄
    • InternLM2.5 (1.8B/7B/20B) 📄
    • Gemma-2-2B 📄
    • Meta-Llama-3.1 (8B/70B) 📄
  • Instruct/Chat models
    • MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
    • Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
    • Yi-Coder-Chat (1.5B/9B) 📄🤖
    • InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
    • Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
    • Gemma-2-2B-it by @codemayq in #5037 📄🤖
    • Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
    • Mistral-Nemo-Instruct (12B) 📄🤖

New datasets

  • Supervised fine-tuning datasets
    • Magpie-ultra-v0.1 (en) 📄
    • Pokemon-gpt4o-captions (en&zh) 📄🖼️
  • Preference datasets
    • RLHF-V (en) 📄🖼️
    • VLFeedback (en) 📄🖼️

Changes

  • Due to compatibility consideration, fine-tuning vision language models (VLMs) requires transformers>=4.35.0.dev0, try pip install git+https://github.com/huggingface/transformers.git to install it.
  • visual_inputs has been deprecated, now you do not need to specify this argument.
  • LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use preprocessing_batch_size to restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ).
  • LlamaFactory now supports lmf (equivalent to llamafactory-cli) as a shortcut command.

Bug fix

v0.8.3: Neat Packing, Split Evaluation

18 Jul 18:00
Compare
Choose a tag to compare

New features

New models

  • Base models
    • InternLM2.5-7B 📄
    • Gemma2 (9B/27B) 📄
  • Instruct/Chat models
    • TeleChat-1B-Chat by @hzhaoy in #4651 📄🤖
    • InternLM2.5-7B-Chat 📄🤖
    • CodeGeeX4-9B-Chat 📄🤖
    • Gemma2-it (9B/27B) 📄🤖

Changes

  • Fix DPO cutoff len and deprecate reserved_label_len argument
  • Improve loss function for reward modeling

Bug fix

v0.8.2: PiSSA, Parallel Functions

19 Jun 13:06
Compare
Choose a tag to compare

New features

New models

  • Base models
    • DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
  • Instruct/Chat models
    • MiniCPM-2B 📄🤖
    • DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖

New datasets

Bug fix

v0.8.1: Patch release

10 Jun 16:50
Compare
Choose a tag to compare
  • Fix #2666: Unsloth+DoRA
  • Fix #4145: The PyTorch version of the docker image does not match the vLLM requirement
  • Fix #4160: The problem in LongLoRA implementation with the help of @f-q23
  • Fix #4167: The installation problem in the Windows system by @yzoaim

v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO

07 Jun 22:26
Compare
Choose a tag to compare

Stronger LlamaBoard 💪😀

  • Support single-node distributed training in Web UI
  • Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
  • Support selecting checkpoints of full/freeze tuning
  • Add throughput metrics to LlamaBoard by @injet-zhou in #4066
  • Faster UI loading

New features

  • Add KTO algorithm by @enji-zhou in #3785
  • Add SimPO algorithm by @hiyouga
  • Support passing max_lora_rank to the vLLM backend by @jue-jue-zi in #3794
  • Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
  • Support setting system messages in CLI inference by @ycjcl868 in #3812
  • Add num_samples option in dataset_info.json by @seanzhang-zhichen in #3829
  • Add NPU docker image by @dongdongqiang2018 in #3876
  • Improve NPU document by @MengqingCao in #3930
  • Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
  • Add llamafactory-cli env for bug report
  • Support image input in the API mode
  • Support random initialization via the train_from_scratch argument
  • Initialize CI

New models

  • Base models
    • Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
    • PaliGemma-3B (pt/mix) 📄🖼️
    • GLM-4-9B 📄
    • Falcon-11B 📄
    • DeepSeek-V2-Lite (16B) 📄
  • Instruct/Chat models
    • Qwen2-Instruct (0.5B/1.5B/7B/72B/MoE) 📄🤖
    • Mistral-7B-Instruct-v0.3 📄🤖
    • Phi-3-small-8k-instruct (7B) 📄🤖
    • Aya-23 (8B/35B) 📄🤖
    • OpenChat-3.6-8B 📄🤖
    • GLM-4-9B-Chat 📄🤖
    • TeleChat-12B-Chat by @hzhaoy in #3958 📄🤖
    • Phi-3-medium-8k-instruct (14B) 📄🤖
    • DeepSeek-V2-Lite-Chat (16B) 📄🤖
    • Codestral-22B-v0.1 📄🤖

New datasets

  • Pre-training datasets
    • FineWeb (en)
    • FineWeb-Edu (en)
  • Supervised fine-tuning datasets
    • Ruozhiba-GPT4 (zh)
    • STEM-Instruction (zh)
  • Preference datasets
    • Argilla-KTO-mix-15K (en)
    • UltraFeedback (en)

Bug fix

v0.7.1: Ascend NPU Support, Yi-VL Models

15 May 18:16
Compare
Choose a tag to compare

🚨🚨 Core refactor 🚨🚨

  • Add CLIs usage, now we recommend using llamafactory-cli to launch training and inference, the entry point is located at the cli.py
  • Rename files: train_bash.py -> train.py, train_web.py -> webui.py, api_demo.py -> api.py
  • Remove files: cli_demo.py, evaluate.py, export_model.py, web_demo.py, use llamafactory-cli chat/eval/export/webchat instead
  • Use YAML configs in examples instead of shell scripts for a pretty view
  • Remove the sha1 hash check when loading datasets
  • Rename arguments: num_layer_trainable -> freeze_trainable_layers, name_module_trainable -> freeze_trainable_modules

The above changes are made by @hiyouga in #3596

REMINDER: Now installation is mandatory to use LLaMA Factory

New features

  • Support training and inference on the Ascend NPU 910 devices by @zhou-wjjw and @statelesshz (docker images are also provided)
  • Support stop parameter in vLLM engine by @zhaonx in #3527
  • Support fine-tuning token embeddings in freeze tuning via the freeze_extra_modules argument
  • Add Llama3 quickstart to readme

New models

  • Base models
    • Yi-1.5 (6B/9B/34B) 📄
    • DeepSeek-V2 (236B) 📄
  • Instruct/Chat models
    • Yi-1.5-Chat (6B/9B/34B) 📄🤖
    • Yi-VL-Chat (6B/34B) by @BUAADreamer in #3748 📄🖼️🤖
    • Llama3-Chinese-Chat (8B/70B) 📄🤖
    • DeepSeek-V2-Chat (236B) 📄🤖

Bug fix

v0.7.0: LLaVA Multimodal LLM Support

27 Apr 20:24
Compare
Choose a tag to compare

Congratulations on 20k stars 🎉 We are the 1st of the GitHub Trending at Apr. 23rd 🔥 Follow us at X

New features

  • Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
  • Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
  • Support vLLM+LoRA inference for partial models (see support list)
  • Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
  • Support adding new special tokens to the tokenizer via the new_special_tokens argument
  • Support choosing the device to merge LoRA in LlamaBoard via the export_device argument
  • Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
  • Automatically enable SDPA attention and fast tokenizer for higher performance

New models

  • Base models
    • OLMo-1.7-7B
    • Jamba-v0.1-51B
    • Qwen1.5-110B
    • DBRX-132B-Base
  • Instruct/Chat models
    • Phi-3-mini-3.8B-instruct (4k/128k)
    • LLaVA-1.5-7B
    • LLaVA-1.5-13B
    • Qwen1.5-110B-Chat
    • DBRX-132B-Instruct

New datasets

  • Supervised fine-tuning datasets
  • Preference datasets

Bug fix

v0.6.3: Llama-3 and 3x Longer QLoRA

21 Apr 15:43
Compare
Choose a tag to compare

New features

  • Support Meta Llama-3 (8B/70B) models
  • Support UnslothAI's long-context QLoRA optimization (56,000 context length for Llama-2 7B in 24GB)
  • Support previewing local datasets in directories in LlamaBoard by @codemayq in #3291

New algorithms

New models

  • Base models
    • CodeGemma (2B/7B)
    • CodeQwen1.5-7B
    • Llama-3 (8B/70B)
    • Mixtral-8x22B-v0.1
  • Instruct/Chat models
    • CodeGemma-7B-it
    • CodeQwen1.5-7B-Chat
    • Llama-3-Instruct (8B/70B)
    • Command R (35B) by @marko1616 in #3254
    • Command R+ (104B) by @marko1616 in #3254
    • Mixtral-8x22B-Instruct-v0.1

Bug fix

v0.6.2: ORPO and Qwen1.5-32B

11 Apr 12:27
Compare
Choose a tag to compare

New features

  • Support ORPO algorithm by @hiyouga in #3066
  • Support inferring BNB 4-bit models on multiple GPUs via the quantization_device_map argument
  • Reorganize README files, move example scripts to the examples folder
  • Support saving & loading arguments quickly in LlamaBoard by @hiyouga and @marko1616 in #3046
  • Support load alpaca-format dataset from the hub without dataset_info.json by specifying --dataset_dir ONLINE
  • Add a parameter moe_aux_loss_coef to control the coefficient of auxiliary loss in MoE models.

New models

  • Base models
    • Breeze-7B-Base
    • Qwen1.5-MoE-A2.7B (14B)
    • Qwen1.5-32B
  • Instruct/Chat models
    • Breeze-7B-Instruct
    • Qwen1.5-MoE-A2.7B-Chat (14B)
    • Qwen1.5-32B-Chat

Bug fix

v0.6.1: Patch release

29 Mar 04:07
Compare
Choose a tag to compare

This patch mainly fixes #2983

In commit 9bec3c9, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the create_optimizer_and_scheduler method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b and 8c77b10. Thank @HideLord for helping us identify this critical bug.

[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881

We have also fixed #2961 #2981 #2982 #2983 #2991 #3010