fastRAG 2.0: Let's do RAG Efficiently 🔥

fastRAG 2.0 includes new highly-anticipated efficiency-oriented components, an updated chat-like demo experience with multi-modality and improvements to existing components.

The library now utilizes efficient Intel optimizations using Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.

🚀 Intel Habana Gaudi 1 and Gaudi 2 Support

fastRAG is the first RAG framework to support Habana Gaudi accelerators for running LLMs efficiently; more details here.

🌀 Running LLMs with the ONNX Runtime and LlamaCPP Backends

Added support to run quantized LLMs on ONNX runtime and LlamaCPP for higher efficiency and speed for all your RAG pipelines.

⚡ CPU Efficient Embedders

We added support running bi-encoder embedders and cross-encoder ranker as efficiently as possible on Intel CPUs using Intel optimized software.

We integrated the optimized embedders into the following two components:

QuantizedBiEncoderRanker - bi-encoder rankers; encodes the documents provided in the input and re-orders according to query similarity.
QuantizedBiEncoderRetriever - bi-encoder retriever; encodes documents into vectors given a vectors store engine.

⏳ REPLUG

An implementation of REPLUG, an advanced technique for ensemble prompting of retrieved documents, processing them in parallel and combining their next token predictions for better results.

🏆 New Demos

We updated our demos (and demo page) to include two new demos that depict a chat-like experience plus fusing multi-modality RAG.

🐠 Enhancements

Added documentation for most models and components, containing examples and notebooks ready to run!
Support for the Fusion-in-Decoder (FiD) model using a dedicated invocation layer.
Various bug fixes and compatibility updates supporting the Haystack framework.

Full Changelog: v1.3.0...v2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0