Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 855 Bytes

README.md

File metadata and controls

24 lines (16 loc) · 855 Bytes

MLX RAG

Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.

I have previously converted the weights for the embedding model gte-large into MLX format, and you can find them stored here in the mlx-rag repository. Additionally, as a base model, I am using NeuralBeagle14-7B-4bit-mlx.

Getting started

  • Install requirements
python3 -m pip install -r requirements.txt
  • Create vector database from a pdf file
python3 create_vdb.py --pdf flash_attention.pdf --vdb vdb.npz
  • Query database (pdf file)
python3 query_vdb.py --question "what is flash attention?"