Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
hiyouga committed Mar 28, 2024
1 parent 1e43319 commit c1fe6ce
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 14 deletions.
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/

[24/03/07] We supported gradient low-rank projection (**[GaLore](https://arxiv.org/abs/2403.03507)**) algorithm. See `examples/extras/galore` for usage.

[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)

<details><summary>Full Changelog</summary>

[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)

[24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `--use_dora` to activate DoRA training.

[24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See `examples/extras/llama_pro` for usage.
Expand Down Expand Up @@ -586,7 +586,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
> [!TIP]
> Use `--model_name_or_path path_to_export` solely to use the exported model.
>
> Use `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
> Use `CUDA_VISIBLE_DEVICES=0`, `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
### Inference with OpenAI-style API

Expand Down Expand Up @@ -662,27 +662,31 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
### Dockerize Training

#### Get ready

Necessary dockerized environment is needed, such as Docker or Docker Compose.

#### Docker support
#### Use Docker

```bash
docker build -f ./Dockerfile -t llama-factory:latest .

docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latest
docker run --gpus=all \
-v ./hf_cache:/root/.cache/huggingface/ \
-v ./data:/app/data \
-v ./output:/app/output \
-e CUDA_VISIBLE_DEVICES=0 \
-p 7860:7860 \
--shm-size 16G \
--name llama_factory \
-d llama-factory:latest
```

#### Docker Compose support
#### Use Docker Compose

```bash
docker compose -f ./docker-compose.yml up -d
```

> [!TIP]
> Details about volume:
> * hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory.
> * hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
> * data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
> * output: Set export dir to this location so that the merged result can be accessed directly on the host machine.
Expand Down
36 changes: 33 additions & 3 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd

[24/03/07] 我们支持了梯度低秩投影(**[GaLore](https://arxiv.org/abs/2403.03507)**)算法。详细用法请参照 `examples/extras/galore`

[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)

<details><summary>展开日志</summary>

[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)

[24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `--use_dora` 参数进行 DoRA 微调。

[24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 `examples/extras/llama_pro`
Expand Down Expand Up @@ -585,7 +585,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
> [!TIP]
> 仅使用 `--model_name_or_path path_to_export` 来加载导出后的模型。
>
> 合并 LoRA 权重之后可再次使用 `--export_quantization_bit 4``--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
> 合并 LoRA 权重之后可再次使用 `CUDA_VISIBLE_DEVICES=0``--export_quantization_bit 4``--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
### 使用 OpenAI 风格 API 推理

Expand Down Expand Up @@ -659,6 +659,36 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
> [!TIP]
> 我们建议在量化模型的预测中使用 `--per_device_eval_batch_size=1``--max_target_length 128`
### 使用容器

#### 使用 Docker

```bash
docker build -f ./Dockerfile -t llama-factory:latest .

docker run --gpus=all \
-v ./hf_cache:/root/.cache/huggingface/ \
-v ./data:/app/data \
-v ./output:/app/output \
-e CUDA_VISIBLE_DEVICES=0 \
-p 7860:7860 \
--shm-size 16G \
--name llama_factory \
-d llama-factory:latest
```

#### 使用 Docker Compose

```bash
docker compose -f ./docker-compose.yml up -d
```

> [!TIP]
> 数据卷详情:
> * hf_cache:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。
> * data:宿主机中存放数据集的文件夹路径。
> * output:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。
## 使用了 LLaMA Factory 的项目

1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ services:
- ./hf_cache:/root/.cache/huggingface/
- ./data:/app/data
- ./output:/app/output
environment:
- CUDA_VISIBLE_DEVICES=0
ports:
- "7860:7860"
ipc: host
Expand Down

0 comments on commit c1fe6ce

Please sign in to comment.