Home

Aphrodite Engine

Aphrodite Engine is designed for serving LLMs at scale. It supports the majority of HuggingFace models, including Llama, Mistral, and Mixtral.

Aphrodite also supports multiple weight quantization methods for not-at-scale use-cases. The currently supported quantization methods are GPTQ, AWQ, GGUF, QuIP#, Marlin, and SqueezeLLM. KV Cache quantization is supported (FP8 e5m2).

Please refer to the Installation page for instructions on how to use the engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Aphrodite Engine

Clone this wiki locally