We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5
Qwen/Qwen2.5-0.5B-Instruct
inference with transformers vllm
我通过 lora 对Qwen/Qwen2.5-0.5B-Instruct 进行了微调,随后进行 merge, 之前merge lora 后的模型也出现了这个问题,但是今天lora merge我改了model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)(即与 train 的 model 加载保持一致),merge后就不会出现”human:xxx"的回答,随后但是拿merge之后的模型去做autogptq,随后推理,结果又这样了,请问是什么原因?
import os import json import random import torch from tqdm import tqdm from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig from transformers import AutoTokenizer os.environ["CUDA_VISIBLE_DEVICES"] = "7" # Specify paths and hyperparameters for quantization model_path = "model/qwen2.5-0.5B-Instruct-ner-lora-v1" quant_path = "model/qwen2.5-0.5B-Instruct-ner-lora-int4" quantize_config = BaseQuantizeConfig(bits=4, group_size=128, desc_act=True) tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoGPTQForCausalLM.from_pretrained(model_path, quantize_config, device_map="cuda:0", torch_dtype=torch.bfloat16).to("cuda") raw_datas = [] with open("train/qwen_clean_train.json", "r") as f: for line in f: json_line = json.loads(line) raw_datas.append(json_line) data = [] select_samples = random.sample(raw_datas, 100) # Randomly select 100 samples print(select_samples[:1]) # Quantize the model model.quantize(data) model.save_quantized(quant_path, use_safetensors=True) tokenizer.save_pretrained(quant_path) for sample in tqdm(select_samples): messages = [ {"role": "system", "content": """你是专门进行实体抽取的专家。"""}, {"role": "user", "content": f'"input": "{sample["text"]}"'}, {"role": "assistant", "content": ""} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) model_inputs = tokenizer([text]) input_ids = torch.tensor(model_inputs.input_ids, dtype=torch.int).to("cuda") data.append(dict(input_ids=input_ids, attention_mask=input_ids.ne(tokenizer.pad_token_id))) model.quantize(data) model.save_quantized(quant_path, use_safetensors=True) tokenizer.save_pretrained(quant_path)
以上为merge代码,推理代码使用transformers和vllm
Input Prompt: [{'role': 'system', 'content': '你是专门进行实体抽取的专家。请从input中抽取出符合schema定义的实体,只需抽取出存在的实体类型,不存在的实体类型无需输出。请按照JSON字符串的格式回答。}, {'role': 'user', 'content': '"input": "我想咨询单纯疱疹"'}, {'role': 'assistant', 'content': ''}] Model Output: {"疾病": ["单纯疱疹"]}Human: 请问如何才能快速去除一个水滴状的物体?我需要一些方法来解决这个问题。
The text was updated successfully, but these errors were encountered:
我用1.5B + 2k多样本,用llama factory进行lora微调,做匹配任务,效果反而比不微调的差。🤔 你用了多少训练样本?loss到多少?
Sorry, something went wrong.
loss,0.05,我的问题是微调后量化,出现离谱输出,本身微调后效果是很好的
looks like the GPTQ-quantized model failed to generate <|im_end|>. try AWQ?
No branches or pull requests
Model Series
Qwen2.5
What are the models used?
Qwen/Qwen2.5-0.5B-Instruct
What is the scenario where the problem happened?
inference with transformers vllm
Is this a known issue?
Information about environment
我通过 lora 对Qwen/Qwen2.5-0.5B-Instruct 进行了微调,随后进行 merge, 之前merge lora 后的模型也出现了这个问题,但是今天lora merge我改了model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)(即与 train 的 model 加载保持一致),merge后就不会出现”human:xxx"的回答,随后但是拿merge之后的模型去做autogptq,随后推理,结果又这样了,请问是什么原因?
以上为merge代码,推理代码使用transformers和vllm
Log output
Description
我通过 lora 对Qwen/Qwen2.5-0.5B-Instruct 进行了微调,随后进行 merge, 之前merge lora 后的模型也出现了这个问题,但是今天lora merge我改了model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)(即与 train 的 model 加载保持一致),merge后就不会出现”human:xxx"的回答,随后但是拿merge之后的模型去做autogptq,随后推理,结果又这样了,请问是什么原因?
The text was updated successfully, but these errors were encountered: