Skip to content

Commit dc48d28

Browse files
committed
s
1 parent 1dcb081 commit dc48d28

File tree

6 files changed

+231402
-1
lines changed

6 files changed

+231402
-1
lines changed

data-sample/AdvertiseGen/dev.json

+1,070
Large diffs are not rendered by default.

data-sample/AdvertiseGen/train.json

+114,599
Large diffs are not rendered by default.

data-sample/AdvertiseGen_fix/dev.json

+1,070
Large diffs are not rendered by default.

data-sample/AdvertiseGen_fix/train.json

+114,599
Large diffs are not rendered by default.

data-sample/a.txt

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Prefix dict has been built successfully.
2+
3+
{'eval_rouge-1': 29.68116, 'eval_rouge-2': 6.219114000000001, 'eval_rouge-l': 21.761787999999996, 'eval_bleu-4': 0.02678721715966885, 'eval_runtime': 21.4969, 'eval_samples_per_second': 2.326, 'eval_steps_per_second': 0.186, 'epoch': 0.0}
4+
2%|▋ | 500/30000 [00:57<35:12, 13.96it/s]
5+
100%|█████████████████████████████████████████████| 4/4 [00:15<00:00, 4.07s/it]
6+
Saving model checkpoint to ./output/Lora_temp_test/tmp-checkpoint-500
7+
tokenizer config file saved in ./output/Lora_temp_test/tmp-checkpoint-500/tokenizer_config.json
8+
Special tokens file saved in ./output/Lora_temp_test/tmp-checkpoint-500/special_tokens_map.json
9+
/home/ps/anaconda3/envs/chatglm3/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
10+
11+
12+
dict has been built successfully.
13+
14+
{'eval_rouge-1': 29.438044000000005, 'eval_rouge-2': 5.661042000000001, 'eval_rouge-l': 22.071056, 'eval_bleu-4': 0.02658810013458685, 'eval_runtime': 18.3051, 'eval_samples_per_second': 2.731, 'eval_steps_per_second': 0.219, 'epoch': 0.0}
15+
0%|▏ | 500/100000 [00:54<1:59:24, 13.89it/s]
16+
100%|█████████████████████████████████████████████| 4/4 [00:12<00:00, 2.87s/it]
17+
Checkpoint destination directory /mnt/sda/AmyGLM/finetune/output/Lora/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
18+
Saving model checkpoint to /mnt/sda/AmyGLM/finetune/output/Lora/checkpoint-5
19+
20+
21+

docs/ai/chat-glm3.md

+43-1
Original file line numberDiff line numberDiff line change
@@ -689,4 +689,46 @@ ValueError: mutable default <class 'transformers.training_args_seq2seq.Seq2SeqTr
689689
training_args: Seq2SeqTrainingArguments = dc.field(
690690
default_factory=Seq2SeqTrainingArguments(output_dir='./output')
691691
)
692-
```
692+
```
693+
694+
695+
696+
## 低成本部署
697+
698+
### 模型量化
699+
700+
默认情况下,模型以 FP16 精度加载,运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限,可以尝试以量化方式加载模型,使用方法如下:
701+
702+
```python
703+
model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()
704+
```
705+
706+
模型量化会带来一定的性能损失,经过测试,ChatGLM3-6B 在 4-bit 量化下仍然能够进行自然流畅的生成。
707+
708+
### CPU 部署
709+
710+
如果你没有 GPU 硬件的话,也可以在 CPU 上进行推理,但是推理速度会更慢。使用方法如下(需要大概 32GB 内存)
711+
```python
712+
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
713+
```
714+
715+
### Mac 部署
716+
717+
对于搭载了 Apple Silicon 或者 AMD GPU 的 Mac,可以使用 MPS 后端来在 GPU 上运行 ChatGLM3-6B。需要参考 Apple 的 [官方说明](https://developer.apple.com/metal/pytorch) 安装 PyTorch-Nightly(正确的版本号应该是2.x.x.dev2023xxxx,而不是 2.x.x)。
718+
719+
目前在 MacOS 上只支持[从本地加载模型](README.md#从本地加载模型)。将代码中的模型加载改为从本地加载,并使用 mps 后端:
720+
```python
721+
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
722+
```
723+
724+
加载半精度的 ChatGLM3-6B 模型需要大概 13GB 内存。内存较小的机器(比如 16GB 内存的 MacBook Pro),在空余内存不足的情况下会使用硬盘上的虚拟内存,导致推理速度严重变慢。
725+
726+
### 多卡部署
727+
如果你有多张 GPU,但是每张 GPU 的显存大小都不足以容纳完整的模型,那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`,然后通过如下方法加载模型:
728+
729+
```python
730+
from utils import load_model_on_gpus
731+
732+
model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
733+
```
734+
即可将模型部署到两张 GPU 上进行推理。你可以将 `num_gpus` 改为你希望使用的 GPU 数。默认是均匀切分的,你也可以传入 `device_map` 参数来自己指定。

0 commit comments

Comments
 (0)