diff --git a/README.md b/README.md index baa850766e..af6ef66f65 100644 --- a/README.md +++ b/README.md @@ -76,10 +76,10 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ [24/03/07] We supported gradient low-rank projection (**[GaLore](https://arxiv.org/abs/2403.03507)**) algorithm. See `examples/extras/galore` for usage. -[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.) -
Full Changelog +[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.) + [24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `--use_dora` to activate DoRA training. [24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See `examples/extras/llama_pro` for usage. @@ -586,7 +586,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \ > [!TIP] > Use `--model_name_or_path path_to_export` solely to use the exported model. > -> Use `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights. +> Use `CUDA_VISIBLE_DEVICES=0`, `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights. ### Inference with OpenAI-style API @@ -662,19 +662,23 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ ### Dockerize Training -#### Get ready - -Necessary dockerized environment is needed, such as Docker or Docker Compose. - -#### Docker support +#### Use Docker ```bash docker build -f ./Dockerfile -t llama-factory:latest . -docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latest +docker run --gpus=all \ + -v ./hf_cache:/root/.cache/huggingface/ \ + -v ./data:/app/data \ + -v ./output:/app/output \ + -e CUDA_VISIBLE_DEVICES=0 \ + -p 7860:7860 \ + --shm-size 16G \ + --name llama_factory \ + -d llama-factory:latest ``` -#### Docker Compose support +#### Use Docker Compose ```bash docker compose -f ./docker-compose.yml up -d @@ -682,7 +686,7 @@ docker compose -f ./docker-compose.yml up -d > [!TIP] > Details about volume: -> * hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory. +> * hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory. > * data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI. > * output: Set export dir to this location so that the merged result can be accessed directly on the host machine. diff --git a/README_zh.md b/README_zh.md index 5ef49549ad..d018ee32a4 100644 --- a/README_zh.md +++ b/README_zh.md @@ -76,10 +76,10 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd [24/03/07] 我们支持了梯度低秩投影(**[GaLore](https://arxiv.org/abs/2403.03507)**)算法。详细用法请参照 `examples/extras/galore`。 -[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。) -
展开日志 +[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。) + [24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `--use_dora` 参数进行 DoRA 微调。 [24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 `examples/extras/llama_pro`。 @@ -585,7 +585,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \ > [!TIP] > 仅使用 `--model_name_or_path path_to_export` 来加载导出后的模型。 > -> 合并 LoRA 权重之后可再次使用 `--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。 +> 合并 LoRA 权重之后可再次使用 `CUDA_VISIBLE_DEVICES=0`、`--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。 ### 使用 OpenAI 风格 API 推理 @@ -659,6 +659,36 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ > [!TIP] > 我们建议在量化模型的预测中使用 `--per_device_eval_batch_size=1` 和 `--max_target_length 128`。 +### 使用容器 + +#### 使用 Docker + +```bash +docker build -f ./Dockerfile -t llama-factory:latest . + +docker run --gpus=all \ + -v ./hf_cache:/root/.cache/huggingface/ \ + -v ./data:/app/data \ + -v ./output:/app/output \ + -e CUDA_VISIBLE_DEVICES=0 \ + -p 7860:7860 \ + --shm-size 16G \ + --name llama_factory \ + -d llama-factory:latest +``` + +#### 使用 Docker Compose + +```bash +docker compose -f ./docker-compose.yml up -d +``` + +> [!TIP] +> 数据卷详情: +> * hf_cache:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。 +> * data:宿主机中存放数据集的文件夹路径。 +> * output:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。 + ## 使用了 LLaMA Factory 的项目 1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223) diff --git a/docker-compose.yml b/docker-compose.yml index 9602a3e33c..333dc51e09 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -10,6 +10,8 @@ services: - ./hf_cache:/root/.cache/huggingface/ - ./data:/app/data - ./output:/app/output + environment: + - CUDA_VISIBLE_DEVICES=0 ports: - "7860:7860" ipc: host