You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the random number generator (RNG): if you're using pseudo-RNG, it could control by using a fixed seed.
the difference in implementations: the implementation are not guaranteed to be the same. always using the same framework could help.
the accuracy problems of floating-point arithmetic: in particular, floating-point addition and mulitplication are not necessarily associative. if the order of execuation is random, the result may be different. using higher precisions (e.g., float32) or deterministic algorithms may help (https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms).
in general, those factors do not affect evaluation significantly.
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
VLLM 0.6.5
变压器 4.41.2
vllm:
import os
os.environ[“CUDA_VISIBLE_DEVICES”] = “1”
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
tokenizer = AutoTokenizer.from_pretrained(“/data/models/Qwen2-7B-Instruct”)
sampling_params = SamplingParams(temperature=0.0, repetition_penalty=1.0, max_tokens=2048,best_of=1, top_k=-1, top_p=1)
llm = LLM(model=“/data/models/Qwen2-7B-Instruct”,
dtype='float16',
gpu_memory_utilization=0.9,
enforce_eager=真,
trust_remote_code=真
)
prompt = “给我一个大型语言模型的简短介绍。”
messages = [
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
outputs = llm.generate([text], sampling_params)
对于输出中的输出:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f“提示符: {prompt!r}, 生成的文本: {generated_text!r}”)
hf:
import os
os.environ[“CUDA_VISIBLE_DEVICES”]=“1”
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = “cuda”
model_path = “/data/models/Qwen2-7B-Instruct”
def huggingface(messages):
device = “cuda” # 将模型加载到 model 上的
设备 = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=“float16”,
device_map=“auto”
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors=“pt”).to(device)
print(model_inputs)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048,
do_sample=False,
num_beams=1,
temperature=0,
repetition_penalty=1.0,
)
generated_ids = [
output_ids[len(input_ids):] 对于input_ids,output_ids 在 zip(model_inputs.input_ids, generated_ids) 中
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
prompt = “给我一个大型语言模型的简短介绍。”
messages = [
{“role”: “system”, “content”: “你是个有用的助手。”},
{“role”: “user”, “content”: prompt}
]
huggingface(messages)
vllm 结果:
大型语言模型是一种人工智能 (AI) 模型,它经过大量文本数据的训练,可以理解和生成类似人类的语言。这些模型通常由多层相互连接的人工神经元组成,这些神经元处理输入数据并将其转换为输出预测。\n\n训练过程包括向模型提供大型数据集,例如书籍、文章和网页,以便它可以学习文本中的模式和关系。经过训练后,这些模型可以生成连贯且与上下文相关的文本,使其可用于各种应用,例如语言翻译、文本摘要和聊天机器人开发。一些最著名的大型语言模型包括 GPT(生成式预训练转换器)、BERT(来自 Transformer 的双向编码器表示)和 T5(文本到文本传输转换器)。这些模型在各种自然语言处理 (NLP) 任务中取得了令人印象深刻的成果,并已广泛应用于金融、医疗保健、营销和娱乐等各个行业。
hf 结果:
大型语言模型是一种人工智能 (AI) 模型,专门用于理解和生成人类语言。这些模型在大量文本数据上进行训练,使它们能够学习语言的模式和结构,并生成类似于人类书写文本的文本。
大型语言模型可用于各种任务,包括语言翻译、文本摘要、问答和文本生成。它们通常用于自然语言处理 (NLP) 应用程序,例如聊天机器人、虚拟助手和语言理解系统。
大型语言模型的主要特征之一是它们能够生成连贯且与上下文相关的文本。这是通过使用深度学习算法实现的,深度学习算法允许模型从大量数据中学习,并根据这些数据中的模式和关系进行预测。
总体而言,大型语言模型是理解和生成人类语言的强大工具,它们有可能彻底改变我们与技术以及彼此交互的方式。
我应该如何让他们一致?
The text was updated successfully, but these errors were encountered: