Speculators

Overview

Speculators is a unified library for building, evaluating, and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster speculator model to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. Speculators standardizes this process with reusable formats and tools, enabling easier integration and deployment of speculative decoding in production-grade inference servers.

Key Features

Unified Speculative Decoding Toolkit: Simplifies the development, evaluation, and representation of speculative decoding algorithms, supporting both research and production use cases for LLMs.
Standardized, Extensible Format: Provides a Hugging Face-compatible format for defining speculative models, with tools to convert from external research repositories for easy adoption.
Seamless vLLM Integration: Built for direct deployment into vLLM, enabling low-latency, production-grade inference with minimal overhead.

Getting Started

Installation

Before installing, ensure you have the following prerequisites:

OS: Linux or MacOS
Python: 3.9 or higher

Install Speculators directly from source using pip::

pip install git+https://github.com/neuralmagic/speculators.git

Resources

Here you can find links to our research implementations. These provide prototype code for immediate enablement and experimentation, with plans for productization into the main package soon.

eagle3: This implementation trains models similar to the EAGLE 3 architecture, specifically utilizing the Train Time Test method.
hass: This implementation trains models that are a variation on the EAGLE 1 architecture using the HASS method.

License

Speculators is licensed under the Apache License 2.0.

Cite

If you find Speculators helpful in your research or projects, please consider citing it:

@misc{speculators2025,
  title={Speculators: A Unified Library for Speculative Decoding Algorithms in LLM Serving},
  author={Red Hat},
  year={2025},
  howpublished={\url{https://github.com/neuralmagic/speculators}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
docs		docs
examples/eagle3		examples/eagle3
research		research
src/speculators		src/speculators
tests		tests
.MAINTAINERS		.MAINTAINERS
.gitignore		.gitignore
.lycheeignore		.lycheeignore
.mdformat.toml		.mdformat.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPING.md		DEVELOPING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speculators

Overview

Key Features

Getting Started

Installation

Resources

License

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

neuralmagic/speculators

Folders and files

Latest commit

History

Repository files navigation

Speculators

Overview

Key Features

Getting Started

Installation

Resources

License

Cite

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages