You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to use transformer.pipeline loading the fastlanguagemodel
FastlanguageModel:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# "unsloth/Llama-3.3-70B-Instruct-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.3-70B-Instruct-bnb-4bit", # "unsloth/mistral-7b" for 16bit loading
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
Now I want to integrate this faster model to transformer.pipeline
from transformers import pipeline
pipe = pipeline("text-generation", model="unsloth/Llama-3.3-70B-Instruct-bnb-4bit")
pipe(messages)
How to do that?
Any comment would be appreciate!
The text was updated successfully, but these errors were encountered:
How to use transformer.pipeline loading the fastlanguagemodel
FastlanguageModel:
Now I want to integrate this faster model to transformer.pipeline
How to do that?
Any comment would be appreciate!
The text was updated successfully, but these errors were encountered: