Skip to content

A Streamlit-based chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions about uploaded PDF documents.

Notifications You must be signed in to change notification settings

PythonToGo/rag_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF RAG Chatbot

just for fun; not yet deployed;

A Streamlit-based chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions about uploaded PDF documents.

image

image

Features

  • PDF Upload & Processing: Upload PDF files and automatically process them for RAG
  • Vector Storage: Uses FAISS for efficient document retrieval
  • RAG Integration: Leverages OpenAI's GPT models for intelligent question answering
  • PDF Visualization: View PDF pages as images alongside responses
  • Context Display: See the source documents used to generate answers

Project Structure

chatbot/
├── src/
│   ├── config/
│   │   ├── __init__.py
│   │   └── settings.py          # Configuration and constants
│   ├── core/
│   │   ├── __init__.py
│   │   ├── document_processor.py # PDF processing and vector storage
│   │   └── rag_chain.py         # RAG chain implementation
│   ├── utils/
│   │   ├── __init__.py
│   │   └── pdf_converter.py     # PDF to image conversion
│   ├── ui/
│   │   ├── __init__.py
│   │   └── streamlit_app.py     # Streamlit UI implementation
│   └── __init__.py
├── data/
│   ├── temp_pdfs/              # Temporary PDF storage
│   ├── vector_store/           # FAISS vector database
│   └── pdf_images/             # Converted PDF images
├── tests/                      # Test files
├── docs/                       # Documentation
├── main.py                     # Application entry point
├── requirements.txt            # Python dependencies
└── README.md                   # This file

Installation

  1. Clone the repository:
git clone https://github.com/PythonToGo/rag_chatbot.git
cd chatbot
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables: Create a .env file in the root directory with your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here

Usage

  1. Run the application:
streamlit run main.py
  1. Open your browser and navigate to the provided URL (usually http://localhost:8501)

  2. Upload a PDF file using the file uploader

  3. Ask questions about the uploaded PDF in the text input field

  4. View the generated responses and related document context

Configuration

You can modify the application settings in src/config/settings.py:

  • Model Settings: Change the embedding and chat models
  • Document Processing: Adjust chunk size and overlap
  • Retrieval Settings: Modify the number of retrieved documents
  • Image Conversion: Change DPI settings for PDF to image conversion

Dependencies

  • Streamlit: Web application framework
  • LangChain: RAG framework and document processing
  • OpenAI: Language models and embeddings
  • FAISS: Vector similarity search
  • PyMuPDF: PDF processing and image conversion

Development

Running Tests

# Add test files to the tests/ directory
python -m pytest tests/

Code Structure

The application follows a modular architecture:

  • DocumentProcessor: Handles PDF loading, chunking, and vector storage
  • RAGChain: Manages the RAG pipeline and question processing
  • PDFConverter: Converts PDF pages to images for display
  • StreamlitApp: Main UI application with clean separation of concerns

Adding New Features

  1. Create new modules in the appropriate directory (core/, utils/, ui/)
  2. Update configuration in src/config/settings.py if needed
  3. Add tests in the tests/ directory
  4. Update this README with new features

License

MIT License, Copyright PythonToGo 2025.

About

A Streamlit-based chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions about uploaded PDF documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages