how to use pipeline load the fastlanguage model #1471

Hyfred · 2024-12-24T21:04:01Z

How to use transformer.pipeline loading the fastlanguagemodel

FastlanguageModel:

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# "unsloth/Llama-3.3-70B-Instruct-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.3-70B-Instruct-bnb-4bit", # "unsloth/mistral-7b" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

Now I want to integrate this faster model to transformer.pipeline

from transformers import pipeline
pipe = pipeline("text-generation", model="unsloth/Llama-3.3-70B-Instruct-bnb-4bit")
pipe(messages)

How to do that?
Any comment would be appreciate!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use pipeline load the fastlanguage model #1471

how to use pipeline load the fastlanguage model #1471

Hyfred commented Dec 24, 2024 •

edited

Loading

how to use pipeline load the fastlanguage model #1471

how to use pipeline load the fastlanguage model #1471

Comments

Hyfred commented Dec 24, 2024 • edited Loading

How to use transformer.pipeline loading the fastlanguagemodel

FastlanguageModel:

Now I want to integrate this faster model to transformer.pipeline

Hyfred commented Dec 24, 2024 •

edited

Loading