PolyLLM is a Python package that provides a unified interface for interacting with multiple Large Language Models (LLMs) through a single, consistent API. It simplifies the process of working with different LLM providers by abstracting away their specific implementation details.
Links:
- Unified interface for multiple LLM providers:
- Local LLMs (llama.cpp, llama-cpp-python)
- Ollama
- OpenAI (GPT models)
- Google (Gemini models)
- Anthropic (Claude models)
- Support for different interaction modes:
- Standard chat completion
- Multimodal through image input
- Function calling / tools
- JSON output
- Structured output (using Pydantic models)
- Streaming real-time responses
Provider | Standard Chat | Image Input | JSON | Structured Output | Tool Usage |
---|---|---|---|---|---|
llama.cpp | β | πΆ | β | β | β |
MLX | β | π« | π§ | π§ | π§ |
Ollama | β | β | β | β | β |
Openai | β | β | β | β | β |
β | β | β | β | β | |
Anthropic | β | β | β | β | β |
Provider | Plain Text | JSON | Structured Output | Tool Usage |
---|---|---|---|---|
llama.cpp | β | β | β | π« |
MLX | β | π§ | π§ | π« |
Ollama | β | β | β | π« |
Openai | β | β | β | π« |
β | β | β | π« | |
Anthropic | β | π« | β | π« |
β : Supported
πΆ: Support planned
β: Not yet supported by the LLM provider
π«: Support not planned
Warning
π§: MLX support for structuring techniques is not part of the official mlx_lm module.
A modified version of this GBNF package is included here to support some interim capabilities.
These features are experimental and will be buggy and slow!
pip install polyllm
pip install polyllm[all] # Gets all optional provider dependencies
- Python 3.9+
backoff
pydantic
- Optional dependencies for advanced image input:
numpy
opencv-python
pillow
- Optional dependencies based on which LLM providers you want to use:
llama-cpp-python
mlx-lm
ollama
openai
google-generativeai
anthropic
litellm
Set your API keys as environment variables:
export OPENAI_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"
python -m polyllm.demo \
--image-path /path/to/image.jpg \
--llama-python-model /path/to/model.gguf \
--llama-python-server-port 8000 \
--ollama-model llama3.2-vision \
--openai-model gpt-4o \
--google-model gemini-1.5-flash-latest \
--anthropic-model claude-3-5-sonnet-latest
The model
argument may be provided as one of the following:
- An instance of
llama_cpp.Llama
- Helper function
model = polyllm.load_helpers.load_llama("path/to/model.gguf")
- Helper function
- A 2-tuple containing instances of
mlx.nn.Module
andTokenizerWrapper
- Helper function
model = polyllm.load_helpers.load_mlx("mlx-community/model-name-here", auto_download=True)
- Also accepts a path to a local directory for self-managed downloads
- Helper function
'llamacpp/MODEL'
, whereMODEL
is either the port or ip:port of a running llama-cpp-python server (python -m llama_cpp.server --n_gpu_layers -1 --model path/to/model.gguf
)- Treated as
f'http://localhost:{MODEL}/v1'
ifMODEL
DOES NOT contain a:
. - Treated as
f'http://{MODEL}/v1'
ifMODEL
DOES contain a:
.
- Treated as
'ollama/MODEL_NAME'
, whereMODEL_NAME
matches theollama run MODEL_NAME
command'openai/MODEL_NAME'
'google/MODEL_NAME'
'anthropic/MODEL_NAME'
'litellm/PROVIDER/MODEL_NAME'
- LiteLLM will replace the OpenAI, Google, and Anthropic backends in a future update.
- At that point, you will no longer need to use
'litellm'
at the start of the string.
def generate(
model: str|Llama,
messages: list,
temperature: float = 0.0,
json_output: bool = False,
structured_output_model: BaseModel|None = None,
stream: bool = False,
) -> str | Generator[str, None, None]:
Generate a chat message response as either a string or generator of strings depending on the stream
argument.
def generate_tools(
model: str|Llama,
messages: list,
temperature: float = 0.0,
tools: list[Callable] = None,
) -> tuple[str, str, dict]:
Ask the model to try to use one of the provided tools.
Responds with:
- Text reponse
- Tool name (Use
get_tool_func
to get the tool object) - Tool arguments dictionary
def get_tool_func(
tools: list[Callable],
tool: str,
) -> Callable:
Returns the tool corresponding to the name. Intended for use with the output of generate_tools
.
def structured_output_model_to_schema(
structured_output_model: BaseModel,
indent: int|str|None = None,
) -> str:
Creates a JSON schema string from a Pydantic model. Include the string in one of the messages in a generate(..., structured_output_model)
call to help guide the model on how to respond.
def structured_output_to_object(
structured_output: str,
structured_output_model: type[BaseModel],
) -> BaseModel:
Parse the output of a generate(..., structured_output_model)
call into an instance of the Pydantic BaseModel.
import polyllm
Run python -m polyllm
to see the full list of detected Ollama, OpenAI, Google, and Anthropic models.
response = polyllm.generate(
model="openai/gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello, how are you?"}],
temperature=0.2,
)
print(response)
# Prints:
# Hello! I'm just a computer program, so I don't have feelings, but I'm here to help you. How can I assist you today?
for chunk in polyllm.generate(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Tell me a story"}],
temperature=0.7,
stream=True,
):
print(chunk, end='', flush=True)
print()
# Prints (a word or so at a time):
# Once upon a time, ...
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image", "image": "/path/to/image"},
# These also work if you have the image as
# an np.array / PIL Image instead of on disk:
# {"type": "image", "image": cv2.imread("/path/to/image")},
# {"type": "image", "image": Image.open("/path/to/image")},
],
},
]
response = polyllm.generate(
model="ollama/llama3.2-vision",
messages=messages,
)
print(response)
# Prints:
# This image depicts ...
def multiply_large_numbers(x: int, y: int) -> int:
"""Multiplies two large numbers."""
return x * y
tools = [multiply_large_numbers]
response, tool, args = polyllm.generate_tools(
model="google/gemini-1.5-pro-latest",
messages=[{"role": "user", "content": "What is 123456 multiplied by 654321?"}],
tools=tools,
)
tool_func = polyllm.get_tool_func(tools, tool)
if tool_func:
# print('response:', response) # Some models (Anthropic) may return both their tool call AND a text response
tool_result = tool_func(**args)
print(tool_result) # 123456 * 654321 = 80779853376
else:
print(response)
# Prints:
# 80779853376.0
response = polyllm.generate(
model="anthropic/claude-3-5-sonnet-latest",
messages=[{"role": "user", "content": "List three colors in JSON"}],
json_output=True,
)
print(response)
# Prints:
# {
# "colors": [
# "red",
# "blue",
# "green"
# ]
# }
import json
print(json.loads(response))
# Prints:
# {'colors': ['red', 'blue', 'green']}
from pydantic import BaseModel, Field
class Flight(BaseModel):
departure_time: str = Field(description="The time the flight departs")
destination: str = Field(description="The destination of the flight")
class FlightList(BaseModel):
flights: list[Flight] = Field(description="A list of known flight details")
flight_list_schema = polyllm.structured_output_model_to_schema(FlightList, indent=2)
response = polyllm.generate(
model="google/gemini-1.5-pro-latest",
messages=[
{
"role": "user",
"content": f"Write a list of 2 to 5 random flight details.\nProduce the result in JSON that matches this schema:\n{flight_list_schema}",
},
],
structured_output_model=FlightList,
)
print(response)
# Prints:
# {"flights": [{"departure_time": "2024-07-20T08:30", "destination": "JFK"}, {"departure_time": "2024-07-21T14:00", "destination": "LAX"}, {"departure_time": "2024-07-22T16:45", "destination": "ORD"}, {"departure_time": "2024-07-23T09:15", "destination": "SFO"}]}
response_object = polyllm.structured_output_to_object(response, FlightList)
print(response_object.flights[0].destination)
# Prints:
# JFK
from polyllm.langchain import LCPolyLLM
llm = LCPolyLLM(model="openai/gpt-4")
response = llm.invoke("What is your name?")
print(response)
# Prints:
# As an artificial intelligence, I don't have a personal name. You can simply refer to me as OpenAI.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[("system", "you are a bot"), ("human", "{input}")]
)
chain = prompt | llm
response = chain.invoke(input="What are you?")
print(response)
# Prints:
# Bot: I am an artificial intelligence assistant designed to help answer questions and provide information.