PDF RAG Chatbot

just for fun; not yet deployed;

A Streamlit-based chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions about uploaded PDF documents.

Features

PDF Upload & Processing: Upload PDF files and automatically process them for RAG
Vector Storage: Uses FAISS for efficient document retrieval
RAG Integration: Leverages OpenAI's GPT models for intelligent question answering
PDF Visualization: View PDF pages as images alongside responses
Context Display: See the source documents used to generate answers

Project Structure

chatbot/
├── src/
│   ├── config/
│   │   ├── __init__.py
│   │   └── settings.py          # Configuration and constants
│   ├── core/
│   │   ├── __init__.py
│   │   ├── document_processor.py # PDF processing and vector storage
│   │   └── rag_chain.py         # RAG chain implementation
│   ├── utils/
│   │   ├── __init__.py
│   │   └── pdf_converter.py     # PDF to image conversion
│   ├── ui/
│   │   ├── __init__.py
│   │   └── streamlit_app.py     # Streamlit UI implementation
│   └── __init__.py
├── data/
│   ├── temp_pdfs/              # Temporary PDF storage
│   ├── vector_store/           # FAISS vector database
│   └── pdf_images/             # Converted PDF images
├── tests/                      # Test files
├── docs/                       # Documentation
├── main.py                     # Application entry point
├── requirements.txt            # Python dependencies
└── README.md                   # This file

Installation

Clone the repository:

git clone https://github.com/PythonToGo/rag_chatbot.git
cd chatbot

Create a virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the root directory with your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Usage

Run the application:

streamlit run main.py

Open your browser and navigate to the provided URL (usually http://localhost:8501)
Upload a PDF file using the file uploader
Ask questions about the uploaded PDF in the text input field
View the generated responses and related document context

Configuration

You can modify the application settings in src/config/settings.py:

Model Settings: Change the embedding and chat models
Document Processing: Adjust chunk size and overlap
Retrieval Settings: Modify the number of retrieved documents
Image Conversion: Change DPI settings for PDF to image conversion

Dependencies

Streamlit: Web application framework
LangChain: RAG framework and document processing
OpenAI: Language models and embeddings
FAISS: Vector similarity search
PyMuPDF: PDF processing and image conversion

Development

Running Tests

# Add test files to the tests/ directory
python -m pytest tests/

Code Structure

The application follows a modular architecture:

DocumentProcessor: Handles PDF loading, chunking, and vector storage
RAGChain: Manages the RAG pipeline and question processing
PDFConverter: Converts PDF pages to images for display
StreamlitApp: Main UI application with clean separation of concerns

Adding New Features

Create new modules in the appropriate directory (core/, utils/, ui/)
Update configuration in src/config/settings.py if needed
Add tests in the tests/ directory
Update this README with new features

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF RAG Chatbot

just for fun; not yet deployed;

Features

Project Structure

Installation

Usage

Configuration

Dependencies

Development

Running Tests

Code Structure

Adding New Features

License

About

Uh oh!

Releases

Packages

Languages

PythonToGo/rag_chatbot

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Chatbot

just for fun; not yet deployed;

Features

Project Structure

Installation

Usage

Configuration

Dependencies

Development

Running Tests

Code Structure

Adding New Features

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages