PolyLLM

PolyLLM is a Python package that provides a unified interface for interacting with multiple Large Language Models (LLMs) through a single, consistent API. It simplifies the process of working with different LLM providers by abstracting away their specific implementation details.

Links:

Features

Unified interface for multiple LLM providers:
- Local LLMs (llama.cpp, llama-cpp-python)
- Ollama
- OpenAI (GPT models)
- Google (Gemini models)
- Anthropic (Claude models)
Support for different interaction modes:
- Standard chat completion
- Multimodal through image input
- Function calling / tools
- JSON output
- Structured output (using Pydantic models)
- Streaming real-time responses

Feature Support Across Providers

Provider	Standard Chat	Image Input	JSON	Structured Output	Tool Usage
llama.cpp	✅	🔶	✅	✅	✅
MLX	✅	🟫	🚧	🚧	🚧
Ollama	✅	✅	✅	✅	✅
Openai	✅	✅	✅	✅	✅
Google	✅	✅	✅	✅	✅
Anthropic	✅	✅	✅	✅	✅

Streaming Output Modes

Provider	Plain Text	JSON	Structured Output	Tool Usage
llama.cpp	✅	✅	✅	🟫
MLX	✅	🚧	🚧	🟫
Ollama	✅	✅	✅	🟫
Openai	✅	✅	❌	🟫
Google	✅	✅	✅	🟫
Anthropic	✅	🟫	❌	🟫

✅: Supported

🔶: Support planned

❌: Not yet supported by the LLM provider

🟫: Support not planned

Warning

🚧: MLX support for structuring techniques is not part of the official mlx_lm module.

A modified version of this GBNF package is included here to support some interim capabilities.

These features are experimental and will be buggy and slow!

Installation

pip install polyllm

pip install polyllm[all] # Gets all optional provider dependencies

Requirements

Python 3.9+
backoff
pydantic
Optional dependencies for advanced image input:
- numpy
- opencv-python
- pillow
Optional dependencies based on which LLM providers you want to use:
- llama-cpp-python
- mlx-lm
- ollama
- openai
- google-generativeai
- anthropic
- litellm

Configuration

Set your API keys as environment variables:

export OPENAI_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"

Demo

python -m polyllm.demo \
    --image-path /path/to/image.jpg \
    --llama-python-model /path/to/model.gguf \
    --llama-python-server-port 8000 \
    --ollama-model llama3.2-vision \
    --openai-model gpt-4o \
    --google-model gemini-1.5-flash-latest \
    --anthropic-model claude-3-5-sonnet-latest

Overview

Model argument

The model argument may be provided as one of the following:

An instance of llama_cpp.Llama
- Helper function model = polyllm.load_helpers.load_llama("path/to/model.gguf")
A 2-tuple containing instances of mlx.nn.Module and TokenizerWrapper
- Helper function model = polyllm.load_helpers.load_mlx("mlx-community/model-name-here", auto_download=True)
- Also accepts a path to a local directory for self-managed downloads
'llamacpp/MODEL', where MODEL is either the port or ip:port of a running llama-cpp-python server (python -m llama_cpp.server --n_gpu_layers -1 --model path/to/model.gguf)
- Treated as f'http://localhost:{MODEL}/v1' if MODEL DOES NOT contain a :.
- Treated as f'http://{MODEL}/v1' if MODEL DOES contain a :.
'ollama/MODEL_NAME', where MODEL_NAME matches the ollama run MODEL_NAME command
'openai/MODEL_NAME'
'google/MODEL_NAME'
'anthropic/MODEL_NAME'
'litellm/PROVIDER/MODEL_NAME'
- LiteLLM will replace the OpenAI, Google, and Anthropic backends in a future update.
- At that point, you will no longer need to use 'litellm' at the start of the string.

Avaliable Functions

def generate(
    model: str|Llama,
    messages: list,
    temperature: float = 0.0,
    json_output: bool = False,
    structured_output_model: BaseModel|None = None,
    stream: bool = False,
) -> str | Generator[str, None, None]:

Generate a chat message response as either a string or generator of strings depending on the stream argument.

def generate_tools(
    model: str|Llama,
    messages: list,
    temperature: float = 0.0,
    tools: list[Callable] = None,
) -> tuple[str, str, dict]:

Ask the model to try to use one of the provided tools.

Responds with:

Text reponse
Tool name (Use get_tool_func to get the tool object)
Tool arguments dictionary

def get_tool_func(
    tools: list[Callable],
    tool: str,
) -> Callable:

Returns the tool corresponding to the name. Intended for use with the output of generate_tools.

def structured_output_model_to_schema(
    structured_output_model: BaseModel,
    indent: int|str|None = None,
) -> str:

Creates a JSON schema string from a Pydantic model. Include the string in one of the messages in a generate(..., structured_output_model) call to help guide the model on how to respond.

def structured_output_to_object(
    structured_output: str,
    structured_output_model: type[BaseModel],
) -> BaseModel:

Parse the output of a generate(..., structured_output_model) call into an instance of the Pydantic BaseModel.

Quick Start Examples

import polyllm

Run python -m polyllm to see the full list of detected Ollama, OpenAI, Google, and Anthropic models.

Basic Usage

response = polyllm.generate(
    model="openai/gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.2,
)
print(response)

# Prints:
# Hello! I'm just a computer program, so I don't have feelings, but I'm here to help you. How can I assist you today?

Streaming Response

for chunk in polyllm.generate(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    temperature=0.7,
    stream=True,
):
    print(chunk, end='', flush=True)
print()

# Prints (a word or so at a time):
# Once upon a time, ...

Multimodal (Image Input)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image", "image": "/path/to/image"},

            # These also work if you have the image as
            # an np.array / PIL Image instead of on disk:
            # {"type": "image", "image": cv2.imread("/path/to/image")},
            # {"type": "image", "image": Image.open("/path/to/image")},
        ],
    },
]

response = polyllm.generate(
    model="ollama/llama3.2-vision",
    messages=messages,
)
print(response)

# Prints:
# This image depicts ...

Tools / Function Calling

def multiply_large_numbers(x: int, y: int) -> int:
    """Multiplies two large numbers."""
    return x * y


tools = [multiply_large_numbers]
response, tool, args = polyllm.generate_tools(
    model="google/gemini-1.5-pro-latest",
    messages=[{"role": "user", "content": "What is 123456 multiplied by 654321?"}],
    tools=tools,
)

tool_func = polyllm.get_tool_func(tools, tool)
if tool_func:
    # print('response:', response)  # Some models (Anthropic) may return both their tool call AND a text response
    tool_result = tool_func(**args)
    print(tool_result)  # 123456 * 654321 = 80779853376
else:
    print(response)

# Prints:
# 80779853376.0

JSON Output

response = polyllm.generate(
    model="anthropic/claude-3-5-sonnet-latest",
    messages=[{"role": "user", "content": "List three colors in JSON"}],
    json_output=True,
)
print(response)

# Prints:
# {
#   "colors": [
#     "red",
#     "blue",
#     "green"
#   ]
# }

import json
print(json.loads(response))

# Prints:
# {'colors': ['red', 'blue', 'green']}

Structured Output

from pydantic import BaseModel, Field

class Flight(BaseModel):
    departure_time: str = Field(description="The time the flight departs")
    destination: str = Field(description="The destination of the flight")

class FlightList(BaseModel):
    flights: list[Flight] = Field(description="A list of known flight details")

flight_list_schema = polyllm.structured_output_model_to_schema(FlightList, indent=2)


response = polyllm.generate(
    model="google/gemini-1.5-pro-latest",
    messages=[
        {
            "role": "user",
            "content": f"Write a list of 2 to 5 random flight details.\nProduce the result in JSON that matches this schema:\n{flight_list_schema}",
        },
    ],
    structured_output_model=FlightList,
)
print(response)

# Prints:
# {"flights": [{"departure_time": "2024-07-20T08:30", "destination": "JFK"}, {"departure_time": "2024-07-21T14:00", "destination": "LAX"}, {"departure_time": "2024-07-22T16:45", "destination": "ORD"}, {"departure_time": "2024-07-23T09:15", "destination": "SFO"}]}

response_object = polyllm.structured_output_to_object(response, FlightList)
print(response_object.flights[0].destination)

# Prints:
# JFK

Langchain

from polyllm.langchain import LCPolyLLM

llm = LCPolyLLM(model="openai/gpt-4")
response = llm.invoke("What is your name?")
print(response)

# Prints:
# As an artificial intelligence, I don't have a personal name. You can simply refer to me as OpenAI.

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
    [("system", "you are a bot"), ("human", "{input}")]
)
chain = prompt | llm
response = chain.invoke(input="What are you?")
print(response)

# Prints:
# Bot: I am an artificial intelligence assistant designed to help answer questions and provide information.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
src/polyllm		src/polyllm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyLLM

Features

Feature Support Across Providers

Streaming Output Modes

Installation

Requirements

Configuration

Demo

Overview

Model argument

Avaliable Functions

Quick Start Examples

Basic Usage

Streaming Response

Multimodal (Image Input)

Tools / Function Calling

JSON Output

Structured Output

Langchain

About

Releases 9

Packages

Languages

License

RoryMB/PolyLLM

Folders and files

Latest commit

History

Repository files navigation

PolyLLM

Features

Feature Support Across Providers

Streaming Output Modes

Installation

Requirements

Configuration

Demo

Overview

Model argument

Avaliable Functions

Quick Start Examples

Basic Usage

Streaming Response

Multimodal (Image Input)

Tools / Function Calling

JSON Output

Structured Output

Langchain

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages