MLX RAG

Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.

I have previously converted the weights for the embedding model gte-large into MLX format, and you can find them stored here in the mlx-rag repository. Additionally, as a base model, I am using NeuralBeagle14-7B-4bit-mlx.

Getting started

Install requirements

python3 -m pip install -r requirements.txt

Create vector database from a pdf file

python3 create_vdb.py --pdf flash_attention.pdf --vdb vdb.npz

Query database (pdf file)

python3 query_vdb.py --question "what is flash attention?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLX RAG

Getting started

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLX RAG

Getting started