This project is a Document Retrieval application that utilizes Retrieval-Augmented Generation (RAG) techniques to enable users to interact with uploaded PDF documents. By leveraging a Large Language Model (LLM), users can ask questions about the content of the documents and receive accurate answers based on the information retrieved.
- PDF Upload: Users can upload PDF files for processing.
- AI Interaction: Ask questions about the content of the uploaded PDFs.
- Machine Learning Integration: Utilizes advanced machine learning models for document processing and question answering.
- Backend: FastAPI
- Frontend: Streamlit
- Machine Learning: Langchain, Hugging Face Transformers
- Vector Store: FAISS for efficient similarity search
-
Clone the repository:
git clone https://github.com/yourusername/chatpdf.git cd chatpdf
-
Create a virtual environment and activate it:
python -m venv .venv source .venv/bin/activate # On Windows use .venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Start the FastAPI server:
uvicorn app.main:app --reload
-
Open the Streamlit app in another terminal:
streamlit run app/streamlit_app.py
-
Navigate to
http://localhost:8501
in your web browser to access the application.
-
GET /: Returns a welcome message.
-
POST /upload_pdf/: Uploads a PDF file for processing.
- Request: Multipart form data with the PDF file.
- Response: Success message upon successful upload and processing.
-
POST /ask/: Asks a question about the uploaded PDF.
- Request: JSON body with the question.
- Response: The answer to the question based on the PDF content.
-
To run the tests, use:
streamlit run app/streamlit_app.py