-
Notifications
You must be signed in to change notification settings - Fork 1.9k
text generation webui
Next, we will use the text-generation-webui tool as an example to introduce the detailed steps for local deployment without the need for model merging.
Run the following command to clone text-generation-webui and install the necessary dependencies as required
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
Put the downloaded lora weights and the HuggingFace format llama-7B model weights into the loras and models folders, respectively, as shown below
ls loras/chinese-alpaca-lora-7b
adapter_config.json adapter_model.bin special_tokens_map.json tokenizer_config.json tokenizer.model
ls models/llama-7b-hf
pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin config.json pytorch_model.bin.index.json generation_config.json
Copy the tokenizer of lora weights to the models/llama-7b-hf directory and modify /modules/LoRA.py
cp loras/chinese-alpaca-lora-7b/tokenizer.model models/llama-7b-hf/
cp loras/chinese-alpaca-lora-7b/special_tokens_map.json models/llama-7b-hf/
cp loras/chinese-alpaca-lora-7b/tokenizer_config.json models/llama-7b-hf/
Modifying /modules/LoRA.py
is as simple as adding a line before the PeftModel.from_pretrained
method
shared.model.resize_token_embeddings(len(shared.tokenizer))
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_name}"), **params)
Run the following command to talk to chinese-llama/alpaca
python server.py --model llama-7b-hf --lora chinese-alpaca-lora-7b --cpu
Please refer to webui using LoRAs for instructions on how to use LoRAs.In addition, we recommend directly running the merged chinese-alpaca-7b, which will greatly improve the inference speed compared with loading two weights.
If you want to apply Chinese-Alpaca-Plus, please follow the steps below:
- Using merge_llama_with_chinese_lora.py to obtaining a single model weight file in HF format:
python scripts/merge_llama_with_chinese_lora.py \
--base_model path_to_hf_llama \
--lora_model path_to_chinese_llama_plus_lora,path_to_chinese_alpaca_plus_lora \
--output_type huggingface \
--output_dir path_to_webui/models/merged_chinese_alpaca_plus
- Run the following command to talk to chinese-alpaca-plus
python server.py --model merged_chinese_alpaca_plus --cpu
- 模型合并与转换
- 模型量化、推理、部署
- 效果与评测
- 训练细节
- 常见问题
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Details
- FAQ