WHMCSK
diff --git a/‎data-sample/AdvertiseGen/dev.json
+1,070 b/‎data-sample/AdvertiseGen/dev.json
+1,070
diff --git a/‎data-sample/AdvertiseGen/train.json
+114,599 b/‎data-sample/AdvertiseGen/train.json
+114,599
diff --git a/‎data-sample/AdvertiseGen_fix/dev.json
+1,070 b/‎data-sample/AdvertiseGen_fix/dev.json
+1,070
diff --git a/‎data-sample/AdvertiseGen_fix/train.json
+114,599 b/‎data-sample/AdvertiseGen_fix/train.json
+114,599
diff --git a/‎data-sample/a.txt
+21 b/‎data-sample/a.txt
+21
diff --git a/‎docs/ai/chat-glm3.md
+43-1 b/‎docs/ai/chat-glm3.md
+43-1
@@ -0,0 +1,21 @@
+Prefix dict has been built successfully.
+                                                                                
+{'eval_rouge-1': 29.68116, 'eval_rouge-2': 6.219114000000001, 'eval_rouge-l': 21.761787999999996, 'eval_bleu-4': 0.02678721715966885, 'eval_runtime': 21.4969, 'eval_samples_per_second': 2.326, 'eval_steps_per_second': 0.186, 'epoch': 0.0}
+  2%|▋                                      | 500/30000 [00:57<35:12, 13.96it/s]
+100%|█████████████████████████████████████████████| 4/4 [00:15<00:00,  4.07s/it]
+                                                                                Saving model checkpoint to ./output/Lora_temp_test/tmp-checkpoint-500
+tokenizer config file saved in ./output/Lora_temp_test/tmp-checkpoint-500/tokenizer_config.json
+Special tokens file saved in ./output/Lora_temp_test/tmp-checkpoint-500/special_tokens_map.json
+/home/ps/anaconda3/envs/chatglm3/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
+
+
+dict has been built successfully.
+                                                                                
+{'eval_rouge-1': 29.438044000000005, 'eval_rouge-2': 5.661042000000001, 'eval_rouge-l': 22.071056, 'eval_bleu-4': 0.02658810013458685, 'eval_runtime': 18.3051, 'eval_samples_per_second': 2.731, 'eval_steps_per_second': 0.219, 'epoch': 0.0}
+  0%|▏                                   | 500/100000 [00:54<1:59:24, 13.89it/s]
+100%|█████████████████████████████████████████████| 4/4 [00:12<00:00,  2.87s/it]
+                                                                                Checkpoint destination directory /mnt/sda/AmyGLM/finetune/output/Lora/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
+Saving model checkpoint to /mnt/sda/AmyGLM/finetune/output/Lora/checkpoint-5
+
+
+
@@ -689,4 +689,46 @@ ValueError: mutable default <class 'transformers.training_args_seq2seq.Seq2SeqTr
 training_args: Seq2SeqTrainingArguments = dc.field(
         default_factory=Seq2SeqTrainingArguments(output_dir='./output')
     )
-```
+```
+
+
+
+## 低成本部署
+
+### 模型量化
+
+默认情况下，模型以 FP16 精度加载，运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限，可以尝试以量化方式加载模型，使用方法如下：
+
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()
+```
+
+模型量化会带来一定的性能损失，经过测试，ChatGLM3-6B 在 4-bit 量化下仍然能够进行自然流畅的生成。
+
+### CPU 部署
+
+如果你没有 GPU 硬件的话，也可以在 CPU 上进行推理，但是推理速度会更慢。使用方法如下（需要大概 32GB 内存）
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
+```
+
+### Mac 部署
+
+对于搭载了 Apple Silicon 或者 AMD GPU 的 Mac，可以使用 MPS 后端来在 GPU 上运行 ChatGLM3-6B。需要参考 Apple 的 [官方说明](https://developer.apple.com/metal/pytorch) 安装 PyTorch-Nightly（正确的版本号应该是2.x.x.dev2023xxxx，而不是 2.x.x）。
+
+目前在 MacOS 上只支持[从本地加载模型](README.md#从本地加载模型)。将代码中的模型加载改为从本地加载，并使用 mps 后端：
+```python
+model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
+```
+
+加载半精度的 ChatGLM3-6B 模型需要大概 13GB 内存。内存较小的机器（比如 16GB 内存的 MacBook Pro），在空余内存不足的情况下会使用硬盘上的虚拟内存，导致推理速度严重变慢。
+
+### 多卡部署
+如果你有多张 GPU，但是每张 GPU 的显存大小都不足以容纳完整的模型，那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`，然后通过如下方法加载模型：
+
+```python
+from utils import load_model_on_gpus
+
+model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
+```
+即可将模型部署到两张 GPU 上进行推理。你可以将 `num_gpus` 改为你希望使用的 GPU 数。默认是均匀切分的，你也可以传入 `device_map` 参数来自己指定。