A modular Retrieval-Augmented Generation (RAG) system built with FastAPI, Qdrant, and OpenAI.
The system follows a 5-step RAG workflow:
-
Query Routing (
router.py
)- Determines if a query can be answered (ANSWER), needs clarification (CLARIFY), or should be rejected (REJECT)
- Uses LLM to make intelligent routing decisions
- Extensible through
BaseRequestRouter
interface
-
Query Reformulation (
reformulator.py
)- Refines the original query for better retrieval
- Extracts keywords for hybrid search
- Implements
BaseQueryReformulator
for custom reformulation strategies
-
Context Retrieval (
retriever.py
)- Performs hybrid search combining:
- Semantic search using embeddings
- Keyword-based search
- Currently uses Qdrant for vector storage
- Extensible through
BaseRetriever
interface
- Performs hybrid search combining:
-
Completion Check (
completion_checker.py
)- Evaluates if retrieved context is sufficient to answer the query
- Returns confidence score
- Customizable threshold through configuration
- Implements
BaseCompletionChecker
interface
-
Answer Generation (
answer_generator.py
)- Generates final response using retrieved context
- Includes relevant citations
- Provides confidence scoring
- Extensible through
BaseAnswerGenerator
interface
The system is designed for easy extension and modification:
-
LLM Providers
- Currently uses OpenAI
- Can be extended to support other providers (Anthropic, Bedrock, etc.)
- Each component uses abstract base classes for provider independence
-
Vector Databases
- Currently implements Qdrant
- Can be extended to support other vector DBs (Pinecone, Weaviate, etc.)
- Abstract
BaseRetriever
interface for new implementations
-
Document Management
- Flexible document model with metadata support
- Extensible for different document types and sources
-
Search Strategies
- Hybrid search combining semantic and keyword approaches
- Customizable result merging strategies
- Extensible for additional search methods
- Python 3.10+
- Docker and Docker Compose
- OpenAI API key
- Clone the repository:
git clone https://github.com/yourusername/legit-rag.git
cd legit-rag
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create
.env
file:
cp .env.example .env
- Edit
.env
and add your OpenAI API key:
OPENAI_API_KEY=your-key-here
- Start Qdrant vector database:
docker-compose up -d
- Start the API server:
python -m src.api
The API will be available at http://localhost:8000
POST /documents
{
"documents": [
{
"text": "Your document text here",
"metadata": {"source": "wiki", "topic": "example"}
}
]
}
POST /query
{
"query": "Your question here"
}
import requests
# Add documents
docs = {
"documents": [
{
"text": "Example document text",
"metadata": {"source": "example"}
}
]
}
response = requests.post("http://localhost:8000/documents", json=docs)
# Query
query = {
"query": "What does the document say?"
}
response = requests.post("http://localhost:8000/query", json=query)
print(response.json())
Once the server is running, you can access the API documentation at:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
Key configuration options in config.py
:
- LLM models for each component
- Vector DB settings
- Completion threshold
- API endpoints and ports
- Provider-agnostic LLM interface
- Support for streaming responses
- Additional vector database implementations
- Enhanced document preprocessing
- Caching layer for frequent queries
- Batch document processing
- Advanced result ranking strategies