This project demonstrates the creation of a Retrieval-Augmented Generation (RAG) application that interacts with the Mishnah, an ancient Rabbinic text, using AWS Bedrock, LangChain, and ChromaDB. The application supports both English and Hebrew interactions.
- Introduction
- Setup
- Dataset
- Running the RAG Application
- Multilingual RAG Approach
- Conclusion
- Contributing
- License
This project aims to make ancient Jewish texts more accessible by enabling semantic search and finding related sources. The same approach can be applied to any other collection of texts. The project uses state-of-the-art AI technologies to achieve efficient and accurate retrieval and generation of text.
-
Clone the repository:
git clone https://github.com/shlomota/MishnahBot.git cd MishnahBot
-
Install the necessary Python packages:
pip install -r requirements.txt
-
You will need to set up access to AWS Bedrock for calling 3rd party LLMs. Alternatively, you can alter the code and use the key for your preferred LLM.
The dataset for this project is the Mishnah, obtained from the Sefaria-Export repository. The dataset includes both English translations and the original Hebrew text.
If you want to build the vector db from scratch, run the following commands to download the dataset:
git init sefaria-json
cd sefaria-json
git sparse-checkout init --cone
git sparse-checkout set json
git remote add origin https://github.com/Sefaria/Sefaria-Export.git
git pull origin master
mkdir -p new_directory
find Mishnah/Seder* -name "merged.json" -exec cp --parents \{\} new_directory/ \;
sudo apt install tree
tree Mishnah/ | less
To continue building the application you can follow the notebook. Alternatively, you can run the streamlit app directly.
cd src/
streamlit run app.py
This application supports both Hebrew and English interactions. It uses the following approach:
- Input query in Hebrew.
- Translate the query to English.
- Embed the query and retrieve relevant documents.
- Use the original Hebrew texts as context.
- Generate the response in Hebrew.
Input query in English. Embed the query and retrieve relevant documents. Generate the response in English.
This project highlights the potential of RAG applications in making ancient texts accessible and interactive. By combining modern AI technologies with traditional texts, we can create powerful tools for education and research.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.