MoLaD - Modular Language Advisor 📚🤖

Welcome to MoLaD (Modular Language Advisor), your modular AI companion!

📋 Description

MoLaD (Modular Language Advisor) is a modular server built using Large Language Models (LLMs) with tools for PDF-based Retrieval-Augmented Generation (RAG), web search, and conversation history management. The project aims to create a flexible and scalable framework for integrating various AI tools as needed.

💡 Design Philosophy

The goal behind this project was to build an LLM server with custom agents that is:

Open and Flexible: Designed to avoid dependencies on specific models or libraries, making it highly adaptable to various use cases.
Modular Architecture: The code is structured to be as modular as possible, allowing it to serve as a foundational base for similar projects or future extensions.
Built from Scratch: While many open-source libraries exist, this project was built independently to ensure that individual components can function autonomously, promoting scalability and reusability.

This approach ensures that MoLaD is not only powerful but also extensible, encouraging innovation and customization.

🌠 Screenshots

🧠 Models Details

Memory usage details :

Base Load: 3.5 GB
- Includes 2 instances of SmolLMv2 and 1 instance of Llama-3.2-3B-instruct_Q8.gguf
Ingestion Process: 5 GB
- Required during /ingest operations for vector database and re-ranking models
Peak Usage: 5.8 GB
- Observed during response generation
Generation Speed: ~18 tokens/second

Models

Llama-3.2-3B-instruct_Q8.gguf: Primary LLM used to generate responses.
thenlper/gte-small: Embedding model for creating vectorized representations in the vector store (ChromaDB).
SmolLMv2-360M-instruct: Lightweight LLM used to filter and select relevant outputs from ChromaDB and also to summarize data.
BAAI/bge-small-en: Cross-encoder used for re-ranking documents for precision in final outputs.

Model path

I had place the Llama-3.2-3B-instruct model in the parent folder of MoLaD. You can do the same or update the path in worker.py's main() accordingly.

🚀 Key Features

Advanced RAG with Multi-Level Filtering:
MoLaD uses a robust three-step document retrieval process:
- Cosine Similarity: Quickly fetches the most relevant documents from the ChromaDB vector database.
- LLMChainFilter: Filters out irrelevant documents using the SmolLMv2-360M-instruct model, ensuring only relevant content reaches the next stage.
- Cross-Encoder Reranker: Re-ranks documents for precision using BAAI/bge-small-en, fine-tuning results for accuracy.
Web Search Integration:
Combines knowledge from web searches allowing it to write well-rounded answers for user queries.
Powered by Cutting-Edge LLMs:
- Utilizes the Llama-3.2-3B-Instruct-Q8_0.gguf model to generate detailed and insightful answers.
- Efficient inference with Redis-powered batch processing ensures smooth multi-user support.
Optimized for Performance:
- Built with Langchain for seamless integration of components.
- Uses Redis for asynchronous request handling, enabling batch inference and improving responsiveness.
- Deployed using FastAPI and Gunicorn, ensuring scalable and robust performance.

📊 Flow Charts (Built using lucid.app)

Web Seach Flow

RAG flow

🛠️ Technologies Used

Langchain: Framework for building LLM applications.
ChromaDB: Manages and retrieves vectorized document embeddings for RAG.
Redis: Enables efficient multi-user support and batch processing.
FastAPI: High-performance API framework for serving MoLaD.
Gunicorn: Python WSGI server for deploying the application.

💻 Setup and Usage

1. Installation

Clone this repository and install the required dependencies:

git clone https://github.com/SahilChachra/MoLaD.git
cd MoLaD
pip install -r requirements.txt

1.X Better approach

Pull an image from Nvidia with CUDA preinstalled with Ubuntu 22.04. Install PyTorch with CUDA, Llamacpp with CUDA support and relevant packages from requirements. It's a bit tricky to setup but carries a huge learning curve.

Copy or clone the code inside the container and the steps to run the code remains the same.

2. Starting the Server

Launch the server, Redis, and the worker in separate terminals:

Start the FastAPI server:

gunicorn app:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 --timeout 10000 --access-logfile guni.log

Start Redis:
```
redis-server
```
Start the Worker
```
python3 worker.py
```

Modify client_ingest.py and client_infer.py to use the server

python3 client_ingest.py # If you want to use RAG

python3 client_infer.py # Enable/Disable RAG and Websearch tool

2. Usage

Make sure you get your own Serper api key from Serper's site to use Web Search tool. Create a .env file in the project folder and paste it like

SERPER_API_KEY=your_api_key_here

client_ingest.py - If you want to use RAG with your PDF, you can run this code first which will ingest your data to a Vector database and return a PDF RAG Tool against your user id.

client_infer.py - You can modify the payload value and turn on/off the RAG and web search tool.

NOTE - Your PDF RAG Tool is saved against your user id. Make sure you have same user id in Ingest and Infer.

🛠️ Future Roadmap

Improved Language Model Support: Add more advanced LLMs for broader language capabilities.
Expanded File Support: Add support for different file types, including PowerPoint presentations and spreadsheets.
UI Development: Build a user-friendly web interface to enhance accessibility.
Code Parsers: Add code parsers to handle Python/C++ code as output

📜 License

This project is licensed under the GPL-3.0 License.

🙌 Acknowledgments

Special thanks to the creators of:

Langchain for simplifying LLM integrations.
ChromaDB for efficient vector storage and retrieval.
Hugging Face for their open-source models like BAAI/bge-small-en and SmolLMv2-360M-instruct.

📩 Contact

For issues, questions, or contributions, feel free to open an issue or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
uploaded_pdfs		uploaded_pdfs
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
client_infer.py		client_infer.py
client_ingest.py		client_ingest.py
laiaworker.py		laiaworker.py
requirements.txt		requirements.txt
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoLaD - Modular Language Advisor 📚🤖

📋 Description

💡 Design Philosophy

🌠 Screenshots

🧠 Models Details

Memory usage details :

Models

Model path

🚀 Key Features

📊 Flow Charts (Built using lucid.app)

Web Seach Flow

RAG flow

🛠️ Technologies Used

💻 Setup and Usage

1. Installation

1.X Better approach

2. Starting the Server

2. Usage

🛠️ Future Roadmap

📜 License

🙌 Acknowledgments

📩 Contact

About

Releases

Packages

Languages

License

SahilChachra/MoLaD

Folders and files

Latest commit

History

Repository files navigation

MoLaD - Modular Language Advisor 📚🤖

📋 Description

💡 Design Philosophy

🌠 Screenshots

🧠 Models Details

Memory usage details :

Models

Model path

🚀 Key Features

📊 Flow Charts (Built using lucid.app)

Web Seach Flow

RAG flow

🛠️ Technologies Used

💻 Setup and Usage

1. Installation

1.X Better approach

2. Starting the Server

2. Usage

🛠️ Future Roadmap

📜 License

🙌 Acknowledgments

📩 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages